Next: 2 Introduction
Up: Scoring Hidden Markov Models
Previous: Scoring Hidden Markov Models
Subsections
Statistical sequence comparison techniques, such as hidden Markov
models and generalized profiles, calculate the probability that a
sequence was generated by a given model. Log-odds scoring is a means
of evaluating this probability by comparing it to a null hypothesis,
usually a simpler statistical model intended to represent the universe
of sequences as a whole, rather than the group of interest. Such
scoring leads to two immediate questions: what should the null model
be, and what threshold of log-odds score should be deemed a match to
the model.
This paper experimentally analyses these two issues. Within the
context of the Sequence Alignment and Modeling software suite (SAM),
we consider a variety of null models and suitable thresholds.
Additionally, we consider HMMer's log-odds scoring and
SAM's original Z-scoring method. Among the null model choices, a
simple looping null model that emits characters according to the
geometric mean of the character probabilities in the columns
modeled by the HMM performs well or best across all four
discrimination experiments.
Information on obtaining the SAM program suite (free for academic
use), as well as a server interface,
is available from
http://www.cse.ucsc.edu/research/compbio/sam.html. HMMer is
freely available from http://genome.wustl.edu/eddy/hmm.html.
rph@cse.ucsc.edu
Next: 2 Introduction
Up: Scoring Hidden Markov Models
Previous: Scoring Hidden Markov Models
SAM
sam-info@cse.ucsc.edu
UCSC Computational Biology Group