next up previous
Next: 6 Discussion Up: Reduced space hidden Markov Previous: 4 Algorithm

5 Results

After implementing and optimizing the checkpoint algorithm, we performed several experiments on the host workstation in which the dynamic programming calculation was allowed to use a variable amount of memory. Our initial thought was that we would see different levels of efficiency as the problem size grew first beyond the primary cache size and then, in the case of the original space-inefficient code, beyond the main memory size. This turned out not to be the case: the code is computation-bound because of the log-probability manipulation.

Because we did not see great variation in runtime for performing partial checkpointing (e.g., checkpointing only twice to reduce the dynamic programming table's memory requirements to mn/2 entries and only having to recalculate the half of the forward values), the code now always uses full 2-level checkpointing and $O(m\sqrt{n})$ space.

Performance is summarized in two sets of graphs. Figure 3 shows runtime versus sequence length for performing 4 dynamic programming calculations between a length 500 model and sequences of various lengths. The upper graph shows central processing unit (CPU) time, while the lower graph shows real, or wall-clock, time. We took data for forward-backward training using both full 2-level checkpointing (top curve) and no checkpointing (middle curve), as well as checkpointed and uncheckpointed implementations of the Viterbi best path algorithm. The checkpointing forward-backward code is about 10% slower than the uncheckpointed version. The Viterbi method is approximately seven times faster than forward-backward because of its simpler computation. The checkpointed and the uncheckpointed Viterbi algorithm perform similarly until paging begins. This could be due to the Viterbi forward calculation being relatively simple, reducing the penalty of recalculation when compared to the overhead of updating an HMM. Above a sequence length of 7000 residues, the 96Mbytes of main memory on the test machine are exhausted, and the wall clock time of the uncheckpointed code increases to a factor of 5 at 10000 residues.

Data for a 2000-node RNA model shows similar results (Figure 4), though in this case, because of the larger model, virtual memory degradation begins around a length 2000 sequence (4$\times10^6$ dynamic program cells or 48 Mbytes, consistent with the protein results). We were unable to obtain uncheckpointed results beyond sequences of length 2000 because of the excessive virtual memory use.

Since including the checkpoint method in SAM, the code has been used to model 31 sequences of $\sim 9500$ bases as part of an effort to determine how well HMMs can align RNA sequences (the folding of which is not modeled by a linear HMM), as well as to form a multiple alignment of 60 16s RNA sequences. Neither of these tasks could be completed with the previous version of SAM.


next up previous
Next: 6 Discussion Up: Reduced space hidden Markov Previous: 4 Algorithm
SAM
sam-info@cse.ucsc.edu
UCSC Computational Biology Group