next up previous contents
Next: Data Up: Prediction of Gene-encoding Previous: Gene Encoding Measures

Refinments to the Optimal Parse Method

What makes our implementation of Optimal Parse significantly more efficient than other methods of gene parsing is our use of ``feature matrix compression''. We realized that it is not necessary to build a matrix containing elements for every nucleotides of sequence S from . Instead, only the bonds between nucleotides which delimit the edges between gene and nongene regions are considered. The smaller matrix which is built is called a ``feature matrix'' because it contains only the features which represent edges. The criterion for an edge is that it either follows a stop codon or precedes a start codon. Assuming equal codon usage, these edges should occur about around 6/64 of the time (3 start codons, 3 stop codons). We have found empirically that the compressed feature matrices are about one tenth the size of the uncompressed matrices.

As a further improvement to this feature, we think it may be possible to compress the nongene and gene matrices seperately, where the compressed nongene matrix will contain only nongene/gene (start codon) edges, and the compressed gene matrix will contain only gene/nongene (stop codon) edges. The method of parsing a contig using these highly compressed maps has not been worked out to our satisfaction, however.



David Konerding
Sun May 21 12:19:38 PDT 1995