[Genome] question on chain data format

Kayla Smith kayla at soe.ucsc.edu
Tue Aug 7 12:18:58 PDT 2007


Hello Zhaoshi,

1.  The chain files don't contain sequence data.  Here is where you can 
download sequence data: 
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips

You might also find the axt downloads useful (but note that these are 
for the nets and so will not include all chains):

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsSelf/axtNet/

2.  The dt/dq distances mean how much of your sequence is skipped before 
the next aligning block.  Not everything aligns.  Here is an example 
(from the chain format help page):

chain 4900 chrY 58368225 + 25985406 25985566 chr5 151006098 - 43549808 
43549970 2
   16      0       2
   60      4       0
   10      0       4
   70

  size   dt       dq

This shows 4 ungapped alignment blocks, sizes 16, 60, 10, and 70,  with
one 4bp gap in the reference (between 1st and 2nd block).  So the total
extent of the chain is 16+60+4+10+70 (160bp) in the reference,
which agrees with the tEnd-tStart from the header line (25985566-25985406).

I hope this information is helpful to you.  Please don't hesitate to 
contact us again if you require further assistance.

Kayla Smith
UCSC Genome Bioinformatics Group

zhaoshi wrote:
> Hi--
> 
> I was trying to get the pairwise sequence identity information from the 
> humans self chain data.
> I download the chain files and read the 
> http://genome.ucsc.edu/goldenPath/help/chain.html and
> have some questions:
> 
> 1) it seems that chain file dose not contain sequence identity 
> information, is there any other data that contain this
> information or I need compute by my own base on these chain data?
> 
> 2)  I read the chain format explanation, but I do not quite understand 
> what does 'dt, dq' mean?
> It states in the explanation like:
> dt -- the difference between the end of this block and the beginning of 
> the next block (reference sequence)
> dq -- the difference between the end of this block and the beginning of 
> the next block (query sequence)
> what does this really mean? difference means what ? mismatch ?
> 
> Thanks for your help.
> 
> Zhaoshi
> 
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome



More information about the Genome mailing list