The most interesting result of this work is that the checkpointing method, which performs significantly more computation than the simple method, only slightly slowed down the program for problems that fit in memory even though the vast majority of SAM's computation time is the dynamic programming calculation. This may be due to the greater cache efficiency of the checkpointing algorithm.
Computing the dynamic programming matrix along diagonals has several advantages. Such computation removes all data dependencies from the inner loop of the dynamic programming calculation. This frees the compiler to perform more extensive code reorganization and optimization within the inner loop. Diagonal computation is required when performing dynamic programming on a vector or fine-grain parallel processor [Huang, 1989,Hughey, 1996]. With hindsight, though, the added complexity of the boundary conditions when processing diagonals was not worth potential performance gain. To enhance maintainability, we plan to re-reimplement SAM's inner loop with row checkpoints, learning from the experiences of this work.
The 2-level checkpointing algorithm is not as memory efficient as the
divide-and-conquer approach, requiring
space rather
than O(n+m) space. It has three advantages that make it a strong
alternative to the divide-and-conquer approach. First, it can be used
with the forward-backward calculation. Second, coding may be somewhat
simpler than the divide-and-conquer, especially if row or column
checkpoints are used. Third, the constant overhead appears to be
less than that of the divide-and-conquer approach, perhaps because it
is not a fully recursive algorithm.