What makes our implementation of Optimal Parse significantly more
efficient than other methods of gene parsing is our use of ``feature
matrix compression''. We realized that it is not necessary to build a
matrix containing elements for every nucleotides of sequence S from
. Instead, only the bonds between nucleotides which
delimit the edges between gene and nongene regions are considered.
The smaller matrix which is built is called a ``feature matrix''
because it contains only the features which represent edges. The
criterion for an edge is that it either follows a stop codon or
precedes a start codon. Assuming equal codon usage, these edges
should occur about around 6/64 of the time (3 start codons, 3 stop
codons). We have found empirically that the compressed feature
matrices are about one tenth the size of the uncompressed matrices.
As a further improvement to this feature, we think it may be possible to compress the nongene and gene matrices seperately, where the compressed nongene matrix will contain only nongene/gene (start codon) edges, and the compressed gene matrix will contain only gene/nongene (stop codon) edges. The method of parsing a contig using these highly compressed maps has not been worked out to our satisfaction, however.