[Genome] data inconsistency?
Wei-Jen Chung
goldexp at gmail.com
Thu Jan 11 15:46:32 PST 2007
To Whom It May Be Concerned,
I downloaded the "Multiple alignments of 14 insects with Drosophila
melanogaster" from your database.
Then I checked the data with BLAT, but I found some inconsistency between
the multiple alignments on database and data on BLAT database. Below is an
example of multiple sequence alignment downloaded from UCSC database.
s dm2.chr2L 7568 61 + 22407834
TTCA-GCAACCGAGA---AGAGAACCCACGTTTGAA-------CAAGTATCGGCGTGTGGACAACAGCTATC
s droSim1.chrX 2281100 61 - 17042790
TTCA-GTAACCAAGA---AGAGAATCCTCGTCTAGT-------CAAGTACCGCCGTGTGGATAACAGCTTTC
i droSim1.chrX C 0 C 0
s droSec1.super_8 2368443 61 - 3762037
TTCA-GTAACCAAGA---AGAGAATCCTCGTCTAGT-------CAAGTACCGCCGTGTGGACAACAGCTTTC
i droSec1.super_8 C 0 C 0
s droYak2.chrX 4082557 61 - 21770863
TACA-GCAACCAAGA---AGAGAGCCCTCGTCTGAT-------CAAGTATCGGCGTGTGGACAACAGCTTTC
i droYak2.chrX C 0 C 0
Then I submitted the dm2 sequence:
"TTCA-GCAACCGAGA---AGAGAACCCACGTTTGAA-------CAAGTATCGGCGTGTGGACAACAGCTATC"
to BLAT, and I got the result (
http://genome.ucsc.edu/cgi-bin/hgc?hgsid=84168781&o=7568&t=7629&g=multiz15way&i=multiz15way&c=chr2L&l=7568&r=7629&db=dm2&pix=620
)
As you can see, although the multiple sequence alignment is consistent with
the above one, but the start number is different.
In the downloaded alignment file, the start number of aligned sequence in
Drosim1 is "2281100"; in Drosec1 is "2368443"; in Droyak2 is "4082557", but
in BLAT alignment, the start number of aligned sequence in Drosim1 is
"14761630"; in Drosec1 is "1393534"; in Droyak2 is "17688246". Could you
explain why there are such differences between two data sets? And, which one
is correct and can be used for further analysis? Thank you very much.
Best regards,
Wei-Jen Chung
More information about the Genome
mailing list