[Genome] Inconsistent .over.chain's ?

Xueya Zhou xueyazhou at gmail.com
Sun Dec 23 08:25:49 PST 2007


Dear UCSC Genome Browser Team,

I want to ask a technical question on generating a .over.chain  from a
.all.chain.

I download the .all.chain from you public web site (e.g.
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/), then followed
the procedures detailed in doBlastzChainNet.pl scripts in the Kent source
tree to extract an single coverage .over.chain as the following:

chainPreNet $tDb.$qDb.all.chain.gz $tDb.chrom.sizes $qDb.chrom.sizes stdout
| chainNet stdin -minSpace=1 $tDb.chrom.sizes $qDb.chrom.sizes stdout
/dev/null | netSyntenic stdin noClass.net

netChainSubset -verbose=0 noClass.net $all_chain stdout | chainStitchId
stdin stdout | gzip -c > $tDb.$qDb.over.chain.gz

To my surprise, that I found the generated .over.chain is some what
different from my downloaded liftOver files. I compared the
hg18.panTro2.over.chain generated by myself with that of
hg18ToPanTro2.over.chain from downloads. The former have about ten thousand
less chains than the latter (compared by: grep '^chain' *.over.chain | wc
-l). And a considerable portion of these two set of over.chain's are not
identical to each other.  I don't understand this inconsistency if we use
the same input data (.all.chain), the same programs and follow the same
procedures. I did not look deep into how different these two over.chain's
are. Would it be possible if the aligned blocks are the same but the way
they are chained differ? I think it is unlikely if the algorithm is
deterministic. Or can it be caused by the orders of the chains that feed
into the program? Then I want to know the effect of this discrepancy in my
downstream analysis.

I am particularly concerned about this, because I also used the same set of
tools to generate human-chimp reciprocal best alignment chains and nets,
which are not available in your public sites. So I would like to hear some
expert's suggestions on this issues.

Thank you very much and Merry Christmas!

Xueya


More information about the Genome mailing list