[Genome] knownGene.txt question

Dmitriy Skvortsov dmitriy.skvortsov at gmail.com
Mon Nov 19 15:59:46 PST 2007


Dear colleagues,
While working with  knownGene.txt file which I downloaded from
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
I found that in 17428 cases out of 56722 ,  total length of exons do
not match length of the sequence from file   knownGeneMrna.txt  which
I downloaded
from same source.


Example
uc001aad.1

line 4  from  knownGene.txt
uc001aad.1	chr1	-	4558	7231	4558	7173	8	4558,4832,5658,5769,6469,6720,6723,7095,	4692,4901,5767,5810,6628,6721,6918,7231,


Exon start	Exon end 	exon length
1  4558		4692		134
2  4832		4901		69
3  5658		5767		109
4  5769		5810		41
5  6469		6628		159
6  6720		6721		1
7  6723		6918		195
8  7095		7231		136
				total	844

mrna sequence from file knownGeneMrna.txt

atgctgggggcagagacagaggagaagctgtttgatgcccccttgtccatcagcaagagagagcagctggaacagcaggtcccagagaactacttctatgtgccagacctgggccaggtgcc
tgagattgatgttccatcctacctgcctgacctgcccggcattgccaacgacctcatgtacattgccgacctgggccccggcattgccccctctgcccctggcaccattccagaactgcccacctt
ccacactgaggtagccgagcctctcaaggcagacctacaagatggggtactaacaccacccccaccgcccccaccaccacccccagctcctgaggtgctggccagtgcacccccactccc
accctcaaccgcggcccctgtaggccaaggcgccaggcaggacgacagcagcagcagcacgtctccttcagtccagggagctcccagggaagtggtcgacccctccggtggctgggccact
ctgctagagtccatccgccaagctgggggcatcggcaaggccaagctgcgcagcatgaaggagcgaaagctggagaagaagaagcagaaggagcaggagcaagtgagagccacgagcca
aggtgggcacttgatgtcggatctcttcaacaagctggtcatgaggcgcaagggcatctctgggaaaggacctggggctggtgaggggcccggaggagcctttgcccgcgtgtcagactccatccc
tcctctgccgccaccgcagcagccacaggcagaggaggacgaggacgactgggaatcctag

length  795 nt

795 != 844
I am attaching list of all 17000  UCID   where annotation do not match sequence.
Recently we report  similar discrepancy  in refSeq I guess that  error
in knownGene.txt & knownGeneMrna.txt  have same origin.
Hope this would help
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: knownGene_annotation.errors.txt
Url: http://www.soe.ucsc.edu/pipermail/genome/attachments/20071119/b9cdfd64/attachment-0001.txt 


More information about the Genome mailing list