next up previous contents
Next: Finding the Best Up: Method Previous: Method

Generating L Matrices

Fix a DNA sequence S of letters from the deoxyriboncleic alphabet A,C,G,T.

The letters of S are numbered from , where l is the length of S. Since we are looking for genes, it will be convenient to use a notation of codons. Sequence S of length l, can also be seen as a sequence of codons, , ... , where . We use to represent the start codon, to represent the stop codon, and ... to represent the codons between the start and the stop.

For the sequence S several T upper diagonal matrices are created, one for each statistic G in a collection of ``gene statistics'' and one for each statistic NG in a collection of ``nongene statistics''. For each i and j, where , is a negative real number (logarithm of a probability) that we call the ``score'' for the gene statistic G over the region of the sequence S. If G is a good gene statistic, then the more the region of S from looks like an gene, the larger will be. Statistics that measure similarity to nongenes are also defined in an analogous way, and these are stored in matrices . Note that these, and other matrices, need only be stored as half-matrices for which j>i.

The set of T matrices are combined into two other half-matrices. The L matrices can be combined in many different ways; we have experimented with several methods.

The most straightforward is to use a weighted linear combination of sensors:

 

 

However, this method does not guarantee that the L scores are normalized probabilities. If this condition is necessary, one may use a combination called ``softmax'':

 

 

where

 

The score of a parse , denoted , is defined to be the sum of the and values for the regions of S that make up the parse. An example is provided in section gif; the parse has nongene regions, and and one gene encoding region, .

The score is the sum of the and matrices corresponding to those regions:

 



next up previous contents
Next: Finding the Best Up: Method Previous: Method



David Konerding
Sun May 21 12:19:38 PDT 1995