[Genome] knownGene.txt question

Archana Thakkapallayil archanat at soe.ucsc.edu
Tue Nov 27 09:48:21 PST 2007


Hello Dmitriy,

Here is the response from Jim Kent on your question:

-----------

uc001aad is in a duplicon that looks like it appears in several places 
in the genome. Since this is based on a RefSeq gene, we take the RefSeq 
mRNA (NM_182905) in preference to the genomic RNA as cut into exons.  In 
general I'm hoping that all of the cases you find where the total exon 
length and the mRNA length disagree they are RefSeqs, and there should 
only be a few hundreds of them I think.

-----------

I hope this is helpful.

Regards,

Archana
UCSC Genome Bioinformatics Group

Dmitriy Skvortsov wrote:
> Dear colleagues,
> While working with  knownGene.txt file which I downloaded from
> http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
> I found that in 17428 cases out of 56722 ,  total length of exons do
> not match length of the sequence from file   knownGeneMrna.txt  which
> I downloaded
> from same source.
>
>
> Example
> uc001aad.1
>
> line 4  from  knownGene.txt
> uc001aad.1	chr1	-	4558	7231	4558	7173	8	4558,4832,5658,5769,6469,6720,6723,7095,	4692,4901,5767,5810,6628,6721,6918,7231,
>
>
> Exon start	Exon end 	exon length
> 1  4558		4692		134
> 2  4832		4901		69
> 3  5658		5767		109
> 4  5769		5810		41
> 5  6469		6628		159
> 6  6720		6721		1
> 7  6723		6918		195
> 8  7095		7231		136
> 				total	844
>
> mrna sequence from file knownGeneMrna.txt
>
> atgctgggggcagagacagaggagaagctgtttgatgcccccttgtccatcagcaagagagagcagctggaacagcaggtcccagagaactacttctatgtgccagacctgggccaggtgcc
> tgagattgatgttccatcctacctgcctgacctgcccggcattgccaacgacctcatgtacattgccgacctgggccccggcattgccccctctgcccctggcaccattccagaactgcccacctt
> ccacactgaggtagccgagcctctcaaggcagacctacaagatggggtactaacaccacccccaccgcccccaccaccacccccagctcctgaggtgctggccagtgcacccccactccc
> accctcaaccgcggcccctgtaggccaaggcgccaggcaggacgacagcagcagcagcacgtctccttcagtccagggagctcccagggaagtggtcgacccctccggtggctgggccact
> ctgctagagtccatccgccaagctgggggcatcggcaaggccaagctgcgcagcatgaaggagcgaaagctggagaagaagaagcagaaggagcaggagcaagtgagagccacgagcca
> aggtgggcacttgatgtcggatctcttcaacaagctggtcatgaggcgcaagggcatctctgggaaaggacctggggctggtgaggggcccggaggagcctttgcccgcgtgtcagactccatccc
> tcctctgccgccaccgcagcagccacaggcagaggaggacgaggacgactgggaatcctag
>
> length  795 nt
>
> 795 != 844
> I am attaching list of all 17000  UCID   where annotation do not match sequence.
> Recently we report  similar discrepancy  in refSeq I guess that  error
> in knownGene.txt & knownGeneMrna.txt  have same origin.
> Hope this would help
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list