Under construction.
SCFG:
Papers on the RNA work performed here
at University of California, Santa Cruz are available
in the
UCSC Computational Biology
group's
RNA FTP
directory.
Specific papers of interest
-
Yasubumi Sakakibara,
Michael Brown,
Richard Hughey,
I. Saira Mian,
Kimmen Sjolander,
Rebecca C. Underwood and
David Haussler.
Stochastic Context-Free Grammars for tRNA Modeling
Nucleic Acids Research
,22(23):5112--5120, 1994.
Abstract
Stochastic context-free grammars (SCFGs) are applied to the problems
of folding, aligning and modeling families of tRNA sequences.
SCFGs capture the sequences' common primary and secondary structure
and generalize the hidden Markov models (HMMs) used in related work
on protein and DNA.
Results show that
after having
been trained on as few as 20 tRNA sequences from
only two tRNA subfamilies (mitochondrial and cytoplasmic), the model can
discern general tRNA from similar-length RNA sequences of other kinds,
can find secondary structure of new tRNA sequences, and can
produce multiple alignments of large sets of tRNA sequences.
Our results suggest potential improvements
in the alignments of the D- and T-domains in some mitochdondrial
tRNAs that cannot be fit into the canonical secondary structure.
-
Leslie Grate
Automatic RNA Secondary Structure Determination
with
Stochastic Context-Free Grammars
The Third International Conference on
Intelligent Systems for Molecular Biology
July 16 - 19, 1995 (ISMB95)
Abstract
We have developed a method for
predicting the secondary structure
of large RNA multiple alignments
using only the information in the alignment.
It uses a series of progressively more
sensitive searches of the data
in an iterative manner to discover
regions of base pairing, the first
pass examines the entire multiple alignment.
The searching uses two methods to find
base pairings.
Mutual information
is used to measure
covariation between pairs of columns in
the multiple alignment
and
a minimum length encoding method
is used to detect column pairs with
high potential to base pair.
Dynamic programming is used to recover the
optimal tree made up of the best potential
base pairs and to create a
stochastic context-free grammar.
The information in the tree
guides the next iteration of searching.
The method is similar to the traditional
comparative sequence analysis technique.
The method correctly identifies most of the
secondary structure in 16s and 23s rRNA.
Other papers of interest.
Stochastic Context-Free Grammars for Modeling RNA
, in the Hawaii International Conference on System Sciences, 1994.
Recent Methods for RNA Modeling Using Stochastic
Context-Free Grammars
, in
Proceedings of the Asilomar Conference on Combinatorial
Pattern Matching, 1994.
The application of stochastic context-free grammars to folding,
aligning and modeling homologous RNA sequences
, a large technical report covering stochastic context-free grammars.
UCSC-CRL-94-14, 1993.