[Genome] knownGene.txt question
Dmitriy Skvortsov
dmitriy.skvortsov at gmail.com
Mon Nov 19 15:59:46 PST 2007
Dear colleagues,
While working with knownGene.txt file which I downloaded from
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
I found that in 17428 cases out of 56722 , total length of exons do
not match length of the sequence from file knownGeneMrna.txt which
I downloaded
from same source.
Example
uc001aad.1
line 4 from knownGene.txt
uc001aad.1 chr1 - 4558 7231 4558 7173 8 4558,4832,5658,5769,6469,6720,6723,7095, 4692,4901,5767,5810,6628,6721,6918,7231,
Exon start Exon end exon length
1 4558 4692 134
2 4832 4901 69
3 5658 5767 109
4 5769 5810 41
5 6469 6628 159
6 6720 6721 1
7 6723 6918 195
8 7095 7231 136
total 844
mrna sequence from file knownGeneMrna.txt
atgctgggggcagagacagaggagaagctgtttgatgcccccttgtccatcagcaagagagagcagctggaacagcaggtcccagagaactacttctatgtgccagacctgggccaggtgcc
tgagattgatgttccatcctacctgcctgacctgcccggcattgccaacgacctcatgtacattgccgacctgggccccggcattgccccctctgcccctggcaccattccagaactgcccacctt
ccacactgaggtagccgagcctctcaaggcagacctacaagatggggtactaacaccacccccaccgcccccaccaccacccccagctcctgaggtgctggccagtgcacccccactccc
accctcaaccgcggcccctgtaggccaaggcgccaggcaggacgacagcagcagcagcacgtctccttcagtccagggagctcccagggaagtggtcgacccctccggtggctgggccact
ctgctagagtccatccgccaagctgggggcatcggcaaggccaagctgcgcagcatgaaggagcgaaagctggagaagaagaagcagaaggagcaggagcaagtgagagccacgagcca
aggtgggcacttgatgtcggatctcttcaacaagctggtcatgaggcgcaagggcatctctgggaaaggacctggggctggtgaggggcccggaggagcctttgcccgcgtgtcagactccatccc
tcctctgccgccaccgcagcagccacaggcagaggaggacgaggacgactgggaatcctag
length 795 nt
795 != 844
I am attaching list of all 17000 UCID where annotation do not match sequence.
Recently we report similar discrepancy in refSeq I guess that error
in knownGene.txt & knownGeneMrna.txt have same origin.
Hope this would help
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: knownGene_annotation.errors.txt
Url: http://www.soe.ucsc.edu/pipermail/genome/attachments/20071119/b9cdfd64/attachment-0001.txt
More information about the Genome
mailing list