[Genome] how the blocks in the chain are organized?
Ann Zweig
ann at soe.ucsc.edu
Mon Jan 8 15:52:30 PST 2007
Hello Xianjun,
The layout of the chain and chainLink tables are not intuitive. I
will give you some details about their structure, and then you can let
us know if this does not answer your question.
These specific chains were first created for the zebrafish assembly
(danRer4) and swapped to the human assembly (hg18). As you noted, there
are 148 rows in each of the two tables for values with chainId of 2107:
danRer4.chr20_chainHg18Link
hg18.chr14_chainDanRer4Link
Although you reported seeing only about 37 solid blocks, there are
actually many more (84 to be exact). You can see them when you make
your display quite wide and zoom in to a very detailed level. However,
this still does not explain why there are 148 rows in the tables. As
you noted, there are several rows that are "neighbored in coordinates".
Yes, you can combine those rows to make one block. In fact, when you
do that on the hg18.chr14_chainDanRer4Link table, the new table has 84
rows (equal to the number of solid blocks you see in the browser).
For example, from the original table:
mysql> select * from chr14_chainDanRer4Link where ChainId=2107 limit 6;
+------+-------+----------+----------+----------+---------+
| bin | tName | tStart | tEnd | qStart | chainId |
+------+-------+----------+----------+----------+---------+
| 1093 | chr14 | 66595113 | 66595258 | 62269888 | 2107 |
| 1093 | chr14 | 66625443 | 66625547 | 62270186 | 2107 |
| 1093 | chr14 | 66625547 | 66625588 | 62270303 | 2107 |
| 1093 | chr14 | 66625588 | 66625601 | 62270351 | 2107 |
| 1093 | chr14 | 66637317 | 66637387 | 62273299 | 2107 |
| 1093 | chr14 | 66646598 | 66646732 | 62273437 | 2107 |
After you combine the neighboring rows, the original 6 rows result in 4
rows:
+------+-------+----------+----------+----------+---------+
| bin | tName | tStart | tEnd | qStart | chainId |
+------+-------+----------+----------+----------+---------+
| 1093 | chr14 | 66595113 | 66595258 | 62269888 | 2107 |
| 1093 | chr14 | 66625443 | 66625601 | 62270351 | 2107 |
| 1093 | chr14 | 66637317 | 66637387 | 62273299 | 2107 |
| 1093 | chr14 | 66646598 | 66646732 | 62273437 | 2107 |
A chain is, by definition, a sequence of gapless aligned blocks, where
there must be no overlaps of blocks' target or query coords within the
chain. Within a chain, target and query coords are monotonically
non-decreasing. If there is any break (even of one base) in *either*
assembly, there will be a new row in each of the two tables.
Be sure to let us know if this is not enough information for you to get
started with these data.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> Hi,
>
> I am using the chain and chainLink table to see how well one exon in
> Human is aligned with the sequence in zebrafish. For each chain in the
> chain table, there are many small blocks in the chainLink table. My
> question is: how to decide the split of the chain into so many small
> blocks? In the chain track of browser, we could only see several blocks
> in each chain.
>
> For example, for human gene ENSG00000072415, I could see it's located in
> the chain of zebrafish, chr20:62269888-62413893, with chainID=2107 in
> chr14_chainDanRer4 table. In the full view of Zebrafish Chain Alignment
> for Human browser, I only could see about 37 solid blocks in the chain
> (see the chr14:66,569,841-66,995,158 regin in Human browser). However,
> when I query the chr14_chainDanRer4Link table with chainID=2107, I could
> see 148 results for this chain. So, why not 37 results as I expected?
> How is this 148 blocks created? How can I get start:end for the expected
> 37 blocks? I could see that most of the 148 blocks are neighbored in
> coordinates, so, it will be 37 blocks if I merge the neighbored ones
> together, right?
>
> Thanks for answering.
>
> Xianjun
More information about the Genome
mailing list