The family of checkpoint algorithms asymptotically provides for an
arbitrary integer L, a factor of cL slowdown (c<1)
in exchange for reducing memory use from O(mn) to
.A single-best-path variant of this family with
matches the
quadratic time and linear space of the divide-and-conquer algorithm.
For SAM running on a workstation, we decided that the L=2 algorithm,
which reduces space use by the square root of the sequence length,
would be sufficient. We describe this variant in more detail below,
and refer the reader to the original work for information on and
analysis of the complete family of algorithms [Grice et al., 1997].
Simply stated, the idea behind 2-level checkpoints is to segment the backwards computation. For each of the segments to be computed, a single checkpoint of all values along a row, column, or diagonal is saved during a global forward calculation. The backward computation now has two parts: recalculating and saving the forward values of that segment using the checkpoint for initial conditions, and then performing the backward dynamic programming on that segment. Thus, with each (i,j) index point in the dynamic programming matrix, there will be an associated global forward calculation, a local forward calculation, and a backward calculation.
While the 2-level checkpoint algorithm performs more computation than
the standard method (two forward calculations and one backward
calculation for each dynamic programming cell), it gains greatly in
memory performance. In the simplest case, it requires space for all
the checkpoints and for performing the complete forward-backward
calculation on one segment. The result is that the 2-level method
requires
space, where m is the model length and n
is the sequence length. For the example above, this will reduce space
use from 48
bytes to 1
bytes. If even
longer sequences and models are required, an additional level of
checkpointing can be added to the algorithm to reduce space use to
, or 300
bytes in the example.
The Viterbi algorithm, used to find the single best path through a dynamic programming matrix (that is, an alignment of a sequence to a model) is, as mentioned, simpler than the forward-backward algorithm. Rather than performing costly exponentiation and logarithm extraction operations on log-probabilities (in SAM, these are greatly sped using table lookup), addition and minimization can be used for the forward-going part of the Viterbi algorithm, while a simple tracing back of the single best path using saved selector bits from the minimizations is all that is required to establish the model to sequence correspondence.
When applying checkpointing to the Viterbi algorithm, diagonal
checkpoints can be used to greatly reduce the amount of recalculation,
making use of the same principle introduced in a parallelization of
the divide-and-conquer algorithm [Huang, 1989]. That is, if the
single best path has been traced back to a point (i',j') on the
i+j=k diagonal, and the next checkpoint is l diagonals away at
i+j=k-l, then only the triangular region bounded by (i'-l,j'),
(i',j'-l), and (i',j'), and containing l2/2 dynamic programming
cells needs to be recalculated to find the best path between the
i+j=k and i+j=k-l diagonals. For the Viterbi algorithm with row
checkpoints, a larger,
(worst case) strip must be
recalculated, depending on the model node back to which the path has
currently been traced. For the full forward-backward dynamic
programming an entire
strip of the matrix is always
required.
For these reasons, we used diagonal checkpoints rather than row or column checkpoints. To speed code development, we decided to use diagonal checkpoints for both the Viterbi algorithm and the forward-backward algorithm.
With hindsight, the added programming complexity and runtime overhead of boundary conditions with the diagonal checkpoints (especially for the local and semi-local algorithms discussed next), as well as the difficulty of maintaining the Viterbi and the forward-backward routines in parallel, made this a poor choice. Our forward-backward and Viterbi procedures would be significantly simpler had we implemented them with row checkpoints. This simplicity, combined with programmer and compiler optimizations, could reduce or eliminate the theoretical advantage of diagonal checkpoints in the case of Viterbi training.
A second goal of SAM's inner loop rewrite was to provide local and semi-local scoring, alignment, and training. This added even more complexity to the decision to use diagonal rather than row checkpoints. Fully local alignment allows the matching of a subregion of the sequence to a subregion of the model. Because HMMs are a probabilistic model, this is more complicated than, for example, the zero-thresholding used in the Smith and Waterman algorithm [Smith & Waterman, 1981].
For fully local alignment, a SAM model is flanked by two copies of the null model, in SAM called free-insertion modules, or FIMs [Hughey & Krogh, 1996,Barrett et al., 1997]. The null model is a simple probabilistic model, such as the background distribution of amino acids, of the universe of sequences not modeled by the HMM. The logarithm of the ratio of the probability of the sequence being generated by the model and by the null model, the log-odds score, indicates how much better (or worse) the structured HMM models the sequence then the simple unstructured null model [Altschul, 1991].
For SAM, a sequence can make use of fully local alignment as follows.
The initial segment of the sequence that does not correspond to any
part of the HMM will align to the initial FIM. The net effect on
the log-odds score of this is zero because the FIM is a copy of the
null model. From the FIM node (thought of as the first column of the
dynamic programming matrix if the model nodes are used to index the
columns), we allow a jump directly into the delete state of any
position within the model. Thus, in the calculation of the
ci,jD values, an additional term is added to the minimization:
, where
is the gap
cost of skipping over an initial segment of the model from the Start
node to the j-th delete node.
In fully local alignment, the correspondence between the sequence and the model can also end at any point within the sequence and the model. This corresponds to allowing jumps from the delete state of any position in the model to the final FIM. Much like the initial FIM, the final FIM will then match any remaining characters of the sequence with a net effect of zero on the final log-odds score. Thus, for the final FIM's delete state in node n (for simplicity, jumps are only allowed into and out of delete states), the term
![]()
Analogous changes are made to the forward-backward calculation.
Semi-local dynamic programming, which allows a complete sequence to match a subsection of the model, is simpler than fully local alignment. Jumps into the model are only allowed for the first character of the sequence while jumps out of the model are only allowed for the last character of the sequence. Thus, the special boundary conditions of fully local alignment only need to be evaluated for two rows of the dynamic programming matrix rather than all rows of the dynamic programming matrix.
Implementation of local and semi-local jumps in combination with the diagonal checkpoints proved time consuming. The jumps occur across values in a row of the dynamic programming matrix. Evaluation of the jumps would be simple and regular with row or column checkpoints, but required more complicated indexing with our chosen diagonal checkpoints.
![]() |
![]() |