next up previous
Next: 5 Results Up: Reduced space hidden Markov Previous: 3 System and methods

Subsections

4 Algorithm

The family of checkpoint algorithms asymptotically provides for an arbitrary integer L, a factor of cL slowdown (c<1) in exchange for reducing memory use from O(mn) to $O(m\sqrt[L]{n})$.A single-best-path variant of this family with $L=\log(n)$ matches the quadratic time and linear space of the divide-and-conquer algorithm. For SAM running on a workstation, we decided that the L=2 algorithm, which reduces space use by the square root of the sequence length, would be sufficient. We describe this variant in more detail below, and refer the reader to the original work for information on and analysis of the complete family of algorithms [Grice et al., 1997].

4.1 2-Level checkpoints

Simply stated, the idea behind 2-level checkpoints is to segment the backwards computation. For each of the segments to be computed, a single checkpoint of all values along a row, column, or diagonal is saved during a global forward calculation. The backward computation now has two parts: recalculating and saving the forward values of that segment using the checkpoint for initial conditions, and then performing the backward dynamic programming on that segment. Thus, with each (i,j) index point in the dynamic programming matrix, there will be an associated global forward calculation, a local forward calculation, and a backward calculation.

While the 2-level checkpoint algorithm performs more computation than the standard method (two forward calculations and one backward calculation for each dynamic programming cell), it gains greatly in memory performance. In the simplest case, it requires space for all the checkpoints and for performing the complete forward-backward calculation on one segment. The result is that the 2-level method requires $O(m\sqrt{n})$ space, where m is the model length and n is the sequence length. For the example above, this will reduce space use from 48$\times10^6$ bytes to 1$\times10^6$ bytes. If even longer sequences and models are required, an additional level of checkpointing can be added to the algorithm to reduce space use to $O(m\sqrt[3]{n})$, or 300$\times 10^3$ bytes in the example.

4.2 Viterbi checkpoints

The Viterbi algorithm, used to find the single best path through a dynamic programming matrix (that is, an alignment of a sequence to a model) is, as mentioned, simpler than the forward-backward algorithm. Rather than performing costly exponentiation and logarithm extraction operations on log-probabilities (in SAM, these are greatly sped using table lookup), addition and minimization can be used for the forward-going part of the Viterbi algorithm, while a simple tracing back of the single best path using saved selector bits from the minimizations is all that is required to establish the model to sequence correspondence.

When applying checkpointing to the Viterbi algorithm, diagonal checkpoints can be used to greatly reduce the amount of recalculation, making use of the same principle introduced in a parallelization of the divide-and-conquer algorithm [Huang, 1989]. That is, if the single best path has been traced back to a point (i',j') on the i+j=k diagonal, and the next checkpoint is l diagonals away at i+j=k-l, then only the triangular region bounded by (i'-l,j'), (i',j'-l), and (i',j'), and containing l2/2 dynamic programming cells needs to be recalculated to find the best path between the i+j=k and i+j=k-l diagonals. For the Viterbi algorithm with row checkpoints, a larger, $l\times m$ (worst case) strip must be recalculated, depending on the model node back to which the path has currently been traced. For the full forward-backward dynamic programming an entire $l\times m$ strip of the matrix is always required.

For these reasons, we used diagonal checkpoints rather than row or column checkpoints. To speed code development, we decided to use diagonal checkpoints for both the Viterbi algorithm and the forward-backward algorithm.

With hindsight, the added programming complexity and runtime overhead of boundary conditions with the diagonal checkpoints (especially for the local and semi-local algorithms discussed next), as well as the difficulty of maintaining the Viterbi and the forward-backward routines in parallel, made this a poor choice. Our forward-backward and Viterbi procedures would be significantly simpler had we implemented them with row checkpoints. This simplicity, combined with programmer and compiler optimizations, could reduce or eliminate the theoretical advantage of diagonal checkpoints in the case of Viterbi training.

4.3 Local and semi-local checkpointing

A second goal of SAM's inner loop rewrite was to provide local and semi-local scoring, alignment, and training. This added even more complexity to the decision to use diagonal rather than row checkpoints. Fully local alignment allows the matching of a subregion of the sequence to a subregion of the model. Because HMMs are a probabilistic model, this is more complicated than, for example, the zero-thresholding used in the Smith and Waterman algorithm [Smith & Waterman, 1981].

For fully local alignment, a SAM model is flanked by two copies of the null model, in SAM called free-insertion modules, or FIMs [Hughey & Krogh, 1996,Barrett et al., 1997]. The null model is a simple probabilistic model, such as the background distribution of amino acids, of the universe of sequences not modeled by the HMM. The logarithm of the ratio of the probability of the sequence being generated by the model and by the null model, the log-odds score, indicates how much better (or worse) the structured HMM models the sequence then the simple unstructured null model [Altschul, 1991].

For SAM, a sequence can make use of fully local alignment as follows. The initial segment of the sequence that does not correspond to any part of the HMM will align to the initial FIM. The net effect on the log-odds score of this is zero because the FIM is a copy of the null model. From the FIM node (thought of as the first column of the dynamic programming matrix if the model nodes are used to index the columns), we allow a jump directly into the delete state of any position within the model. Thus, in the calculation of the ci,jD values, an additional term is added to the minimization: $c_{i,0}^I+g_j^{S\rightarrow D}$, where $g_j^{S\rightarrow D}$ is the gap cost of skipping over an initial segment of the model from the Start node to the j-th delete node.

In fully local alignment, the correspondence between the sequence and the model can also end at any point within the sequence and the model. This corresponds to allowing jumps from the delete state of any position in the model to the final FIM. Much like the initial FIM, the final FIM will then match any remaining characters of the sequence with a net effect of zero on the final log-odds score. Thus, for the final FIM's delete state in node n (for simplicity, jumps are only allowed into and out of delete states), the term

\begin{displaymath}
\min_{0<k<n}(c^D_{i,k} + g_j^{D\rightarrow E})\end{displaymath}

is combined with the standard minimization for cDi,n. For SAM, the cost of jumping into the End node, $g_j^{D\rightarrow E}$, and jumping out of the Start node, $g_j^{S\rightarrow D}$, are global and default to zero cost, as with standard Smith and Waterman, primarily for compatibility with existing models. In the more general form, these costs would be position-dependent [Bucher & Bairoch, 1994], and could also be trained given sufficient data.

Analogous changes are made to the forward-backward calculation.

Semi-local dynamic programming, which allows a complete sequence to match a subsection of the model, is simpler than fully local alignment. Jumps into the model are only allowed for the first character of the sequence while jumps out of the model are only allowed for the last character of the sequence. Thus, the special boundary conditions of fully local alignment only need to be evaluated for two rows of the dynamic programming matrix rather than all rows of the dynamic programming matrix.

Implementation of local and semi-local jumps in combination with the diagonal checkpoints proved time consuming. The jumps occur across values in a row of the dynamic programming matrix. Evaluation of the jumps would be simple and regular with row or column checkpoints, but required more complicated indexing with our chosen diagonal checkpoints.


  
Figure 3: CPU time and wall clock time for performing four dynamic programming calculations with a 500-node model and several protein sequence lengths using the forward-backward (F-B) and Viterbi algorithms. As can be seen comparing the CPU and wall time graphs, above 8000 amino acids, the memory requirements cause the 96MByte workstation to page excessively when the standard algorithm is used.
\begin{figure}
\pscolumn{data/cpu500.eps}

\bigskip

\pscolumn{data/wall500.eps}\end{figure}


  
Figure: Wall clock time for performing four dynamic programming calculations with a 2000-node model and several RNA sequence lengths. Virtual memory problems occur at a similar product of the model length and sequence length, in comparison to Figure 3.
\begin{figure}
\begin{center}
\pscolumn{data/wall2000.eps}\end{center}\end{figure}


next up previous
Next: 5 Results Up: Reduced space hidden Markov Previous: 3 System and methods
SAM
sam-info@cse.ucsc.edu
UCSC Computational Biology Group