After implementing and optimizing the checkpoint algorithm, we performed several experiments on the host workstation in which the dynamic programming calculation was allowed to use a variable amount of memory. Our initial thought was that we would see different levels of efficiency as the problem size grew first beyond the primary cache size and then, in the case of the original space-inefficient code, beyond the main memory size. This turned out not to be the case: the code is computation-bound because of the log-probability manipulation.
Because we did not see great variation in runtime for performing
partial checkpointing (e.g., checkpointing only twice to reduce the
dynamic programming table's memory requirements to mn/2 entries and
only having to recalculate the half of the forward values), the code
now always uses full 2-level checkpointing and
space.
Performance is summarized in two sets of graphs. Figure 3 shows runtime versus sequence length for performing 4 dynamic programming calculations between a length 500 model and sequences of various lengths. The upper graph shows central processing unit (CPU) time, while the lower graph shows real, or wall-clock, time. We took data for forward-backward training using both full 2-level checkpointing (top curve) and no checkpointing (middle curve), as well as checkpointed and uncheckpointed implementations of the Viterbi best path algorithm. The checkpointing forward-backward code is about 10% slower than the uncheckpointed version. The Viterbi method is approximately seven times faster than forward-backward because of its simpler computation. The checkpointed and the uncheckpointed Viterbi algorithm perform similarly until paging begins. This could be due to the Viterbi forward calculation being relatively simple, reducing the penalty of recalculation when compared to the overhead of updating an HMM. Above a sequence length of 7000 residues, the 96Mbytes of main memory on the test machine are exhausted, and the wall clock time of the uncheckpointed code increases to a factor of 5 at 10000 residues.
Data for a 2000-node RNA model shows similar results
(Figure 4), though in this case, because of the larger
model, virtual memory degradation begins around a length 2000 sequence
(4
dynamic program cells or 48 Mbytes, consistent with
the protein results). We were unable to obtain uncheckpointed results
beyond sequences of length 2000 because of the excessive virtual
memory use.
Since including the checkpoint method in SAM, the code has been used
to model 31 sequences of
bases as part of an effort to
determine how well HMMs can align RNA sequences (the folding of which
is not modeled by a linear HMM), as well as to form a multiple
alignment of 60 16s RNA sequences. Neither of these tasks could be
completed with the previous version of SAM.