[Genome] Calulation of Intron phase with refGene
Rachel Harte
hartera at soe.ucsc.edu
Tue Dec 19 14:03:51 PST 2006
Dear Philge,
In both cases below, the 0 means the same thing. This is the frame for the
first exon which is part of the coding region. The difference is that the
two genes are on different strands. Each number (-1,0,1 or 2) in the
exonFrames column corresponds to an exon.
There is an answer to a previous mailing list question that explains this:
http://www.cse.ucsc.edu/pipermail/genome/2006-November/012218.html
I have also explained this in relation to your examples.
03 NM_005465 chr1 - 241733106 242073176
241735173 242073095 13
241733106,241742248,241775434,241782653,241793644,241802850,241843595,
241845020,241867535,241875817,241894696,241925515,242073049,
241735259,241742351,241775522,241782868,241793773,241802973,241843664,
241845086,241867667,241875962,241894808,241925641,242073176,
0 AKT3 cmpl cmpl 1,0,2,0,0,0,0,0,0,2,1,1,0,
There are 13 exons so 13 numbers in exonFrames, one for each exon.
The exonFrames apply to the exons as listed in the table from left to
right. However, since this aligning to the - strand, then exon 13 is
really what we think of as exon 1. Also, because the RefSeq aligns to the
- strand, the ends are really the starts so exon 1 starts at
chr1:242073176 and ends at chr1:242073049 (0-based starts in the table so
this is really 242073050). The CDS starts at chr1:242073095 so the CDS
starts in this first exon therefore the exon frame for this exon is 0
which is the last exon frame number in the exonFrames list. Moving to the
second exon, this begins at chr1:241925641 and ends at chr1:241925515 and
its frame is 1 etc.
For your next example:
refGene Entry: 1766 NM_021948 chr1 + 154878363
154895942 154882470 154895550 14
154878363,154882462,154883216,154883923,154884398,154884983,154887871,
154888663,154892697,154893353,154894090,154894542,154894958,154895442,
154878691,154882561,154883591,154884098,154884526,154885277,154888105,
154889308,154892805,154893512,154894173,154894687,154895149,154895942,
0 BCAN cmpl cmpl -1,0,1,1,2,1,1,1,1,1,1,0,1,0,
This has 14 exons and it aligns to the + strand so it is more
straightforward. The exonStarts, exonEnds and exonFrames can be read from
left to right. The first exon starts at chr1:154878363 and ends at
chr1:154878691 and the CDS starts at chr1:154882470. Therefore the first
exons is entirely UTR and so the frame is -1 as you see in the exonFrames
column. The second exon starts at chr1:154882462 and ends at chr1:154882561.
The CDS starts in this exon so its frame is 0 which is the second item in the
comma-separated list in the exonFrames column.
I hope that this helps you. Please let us know if you have further
questions.
Rachel
Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
On Tue, 19 Dec 2006, Philge Philip wrote:
> Sir
> I am Philge Philip, research assistant from India, doing a project
> related to sequence analysis. Recently as part of my project i have
> calculated the intron phases of all genes in human.I have used refGene
> entries to confirm whether the intron phases i calculated are
> correct.I like to know what the last '0' value represent in the intron
> phases in refGene entry given below.The intron phases i got in my
> calculation are also given below each entry.
>
> refGene Entry :
> 303 NM_005465 chr1 - 241733106 242073176 241735173 242073095 13 241733106,241742248,241775434,241782653,241793644,241802850,241843595,241845020,241867535,241875817,241894696,241925515,242073049, 241735259,241742351,241775522,241782868,241793773,241802973,241843664,241845086,241867667,241875962,241894808,241925641,242073176, 0 AKT3 cmpl cmpl 1,0,2,0,0,0,0,0,0,2,1,1,0,
>
> Calculated intron phase for NM_005465 by myself: 1, 0, 2, 0, 0, 0, 0,
> 0, 0, 2, 1, 1,?
>
> and I like to know what the second '0' value represent in the intron
> phases in refGene entry given below.
>
> refGene Entry: 1766 NM_021948 chr1 + 154878363 154895942 154882470 154895550 14 154878363,154882462,154883216,154883923,154884398,154884983,154887871,154888663,154892697,154893353,154894090,154894542,154894958,154895442, 154878691,154882561,154883591,154884098,154884526,154885277,154888105,154889308,154892805,154893512,154894173,154894687,154895149,154895942, 0 BCAN cmpl cmpl -1,0,1,1,2,1,1,1,1,1,1,0,1,0,
>
> Calculated intron phase for NM_021948 by myself: -1, ?, 1, 1, 2, 1, 1,
> 1, 1, 1, 1, 0, 1, 0
>
> Please guide me to understand the intron phase representation in refGene.
>
> Thanks
> Philge Philip
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list