[Genome] question about knownGene

Rachel Harte hartera at soe.ucsc.edu
Tue Oct 17 09:51:50 PDT 2006


Dear Ying Sheng.

The annotation in the table is correct. For this known gene:

name      | chrom | strand | txStart  | txEnd    | cdsStart | cdsEnd   | 
exonCount | exonStarts                           | exonEnds 
| proteinID  | alignID |
+-----------+-------+--------+----------+----------+----------+----------+-----------+--------------------------------------+--------------------------------------+------------+---------+
| NM_014620 | chr12 | +      | 52696939 | 52735630 | 52733973 | 52735256 | 
4 | 52696939,52733210,52733967,52734900, | 
52697465,52733327,52734412,52735630, | HXC4_HUMAN | R5144   |

the transcription start site on chr12 is 52696940 (txStart + 1 due to 
0-based start coordinates) on the genome and 52733974 is the start of the
coding region (CDS). The UTR region of a gene can contain introns in the 
genomic sequence. NM_014620 is a spliced mRNA transcript so all the introns 
have been removed from the CDS and the UTR regions. Therefore when 
NM_014620 is aligned to the genome, there may be aligned exon blocks of
NM_014620 from the UTR regions with intron sequence in between. The UTR 
refers to the region of the mRNA that is not translated to protein.

Here is the way to interpret this table:

exonStarts shows the start of each aligned block and exonEnds shows the 
end of that aligned block with exonStarts[n] and exonEnds[n] being the 
start and end of the nth exon.

The transcription start site is 52696940 and this is also the start of the 
first exon.
transcription start = 52696940
CDS start = 52733974
exon 1 = 52696940 - 52697465
exon 2 = 52733211 - 52733327
exon 3 = 52733968 - 52734412

Notice that the CDS start falls within exon 3. Therefore the 5' UTR is
exon 1, exon 2 and part of exon 3 (52733968 - 52733973).

Similarly, for the 3' UTR:
cdsEnd = 52735256
This falls within the last exon:
exon 4 = 52734901 - 52735630
So the 3' UTR will be from the base after the CDS end until the end 
of transcription (txEnd):
3' UTR = 52735257 - 52735630

So the region "chr12:52714001-52714096" falls between exon 1 and exon 2 so 
it is in an intron in the 5' UTR region.

I hope that this helps you. Please let us know if you have further 
questions.

Rachel

  On Tue, 17 
Oct 2006, Ying Sheng wrote:

> Dear Sir/Madam,
>
> I found some differences for the gene annotation on the UCSC browser
> "Known gene" track and those in the knownGene table(The same version).
>
> For example:
>
> For human assembly hg18: gene "NM_014620", knownGene table shows
> annotation like this:
> NM_014620	chr12	+	52696939	52735630	52733973	52735256	4
> 52696939,52733210,52733967,52734900,
> 52697465,52733327,52734412,52735630,	HXC4_HUMAN	R5144
>
> the 5' UTR should be in the region chr12:52696939-52733973
> the 3' UTR should be in the region chr12:52735256-52735630
>
> According to this annotation, the region "chr12:52714001-52714096"
> should be annotated as UTR. However, when I checked on the browser, it
> is annoatated as intron.
>
> There are also some other cases like this. Could you explain why this is
> happened?
>
> thanks in advance,
>
> Ying Sheng
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>

-- 
Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu



More information about the Genome mailing list