[Genome] Chained blastz alignments

Angie Hinrichs angie at soe.ucsc.edu
Wed Mar 28 12:54:21 PDT 2007


Hi Eric,

Good question -- the short answer is, axtChain can do a better job.  
And chainNet takes it a step further by ranking and filtering the 
chains, producing a single-coverage (on the reference sequence) 
alignment and highlighting putative rearrangements.

When aligning whole genomes, we chop up the sequences into chunks of 
10M - 30M bases overlapping by 10k, and align the chunks in separate 
blastz jobs on a compute cluster.  When working on chunked sequence, 
blastz has a limited scope in which it could chain local alignments 
together.

AxtChain works on the entire set of blastz alignments for each 
reference chromosome (or scaffold etc), so it can create chains longer 
than the alignment chunk size.  Also, I believe blastz can output 
alignments that have a gap only in one sequence or the other, not both 
simultaneously, while axtChain does support double-sided gaps which is 
the key to the creation of very long chains.  

And although I don't know the details of blastz's chaining algorithm 
(Webb Miller's group at PSU knows best), I don't think it's as 
sophisticated and parameterized as axtChain's.  AxtChain uses the same 
nucleotide scoring matrix as blastz, but also uses a stepwise linear 
gap penalty matrix that allows for less harsh penalties on longer 
gaps, and allows different penalties on a gap in both sequences 
relative to a gap in one or the other.  AxtChain begins with all local 
alignments for a given pair of sequences and strand; it breaks the 
local alignments down into gapless blocks, rescores them, orders them 
by position into a k-d tree (as suggested by Webb Miller), and then 
for each block recurses down the tree to find the best predecessor for 
that block so we end up with high-scoring chains.  The 
best-predecessor decision uses the fancy gap penalties and also looks 
for optimal crossover point if two blocks overlap so it can merge 
them into a higher-scoring block.  The upshot of all this is that we 
can find some very long chains that we believe represent shared common 
ancestry of sequences and can shed some light on rearrangement 
history.  

chainNet takes chains as input, sorts them by score, and then starting 
with the highest-scoring chain, it ranks the chains, filling in gaps 
in the higher-scoring chains with lower-scoring chains where possible.  
The resulting filtered, hierarchical view of the chains is where 
putative rearrangements (inversions, insertions, deletions, 
duplications in the reference sequence) really stand out.

Hope that helps,

Angie

On Wed, 28 Mar 2007, Eric Just wrote:

> Hello,
> 
> I have read some papers and some information on your site about generating
> whole genome alignments.  I am using the technique of running BLASTZ and
> then post-processing the data using axtChain then chainNet to generate the
> non-overlapping gapped global alignements.  My question is about the C
> argument to BLASTZ.  The BLASTZ output says:
> 
>         C(0) 0: no chaining; 1: just output chain; 2: chain and extend;  3:
> just output HSPs
> 
> So can the use of the C argument obviate the need for chaining using
> axtChain?  If not, what is the difference between chaining done by BLASTZ vs
> chaining doen by chainNet?
> 
> Thanks for your time!
> 
> Eric
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
> 

-- 
angie at soe.ucsc.edu
Software Developer, UCSC CBSE / Genome Bioinformatics Group


More information about the Genome mailing list