next up previous contents
Next: Gene Recognition Up: Prediction of Gene-encoding Previous: Contents

Introduction

  Every living being contains nucleic acids, which are chemicals made of a sugar/phosphate backbone and a nucleotide. Information is encoded by stable polymeric strings of DNA in a way similar to information enocoded by strings of words in a language: recognizible sequences of letters have meaning when read by the appropriate mechanism.

DNA sequences may be determined by extracting the DNA from an organism's cells and performing chemical reactions on the sequence. This procedure has been automated and is being used to sequence the genomes of many organisms. The genome is one of the main repositories of nucleic acid information in living organisms.

Sequence determination of Escherichia Coli DNA, which has a circular genome, produces large quantities of anonymous DNA called contigs. Contig sequences length can range from less than 100 nucleotides to more than 20Kbases. This DNA can be visually inspected by looking for regions characteristic of genes or analyzed via computational linguistic techniques by treating sequences as ``texts'' derived from their alphabet A, C, G, T. The techniques use sophisticated statistical modelling to predict optimal gene-encoding regions in long E.Coli DNA contigs where visual inspection is very time-consuming.





David Konerding
Sun May 21 12:19:38 PDT 1995