[Genome] Finding alternative terminal exons .......
Ann Zweig
ann at soe.ucsc.edu
Fri Sep 7 13:27:48 PDT 2007
Hello Rileen,
Thanks for the compliments on the browser.
You are correct that you are not going to find what you want directly
from the altFinish items in the Alt Events track. However, you can use
that track as a starting point to get what you need. Below I outline
the steps you will need to take to find the ending exon for each transcript.
1. Make a custom track of the altFinish items from the Alt Events track.
Navigate to the Table Browser ('Tables' from the top blue navigation
bar) and create a custom track from the Alt Events track. Be sure to
configure the filter to include on altFinish items from the table:
"name does match altFinish"
Read more about creating custom tracks using the table browser here:
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#CustomTrack
2. Intersect that custom track with the UCSC Known Gene track (to make a
new CT).
Again, using the Table Browser, intersect the UCSC Genes track with your
custom track from step 1. Create a new custom track. This will be a
track containing all UCSC Genes that have overlap with altFinish items.
Read more about doing intersections with the Table Browser here:
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection
3. Get the exons from the resulting track.
Choose the resulting custom track from step 2 in the Table Browser.
Choose BED as the output type. From the next page, choose only the
"Exons" button. Press 'get BED'. This will give you a list (in BED
format) of each exon in each UCSC Gene from the custom track in step 2.
4. Mine the output for the location of the "last" exon.
The BED output will have the following columns:
chromosome
chromStart
chromEnd
name
score
strand
Here's an example of the 9 exons from one gene in the track:
chr1 41748270 41749524 uc001cgz.1_exon_0_0_chr1_41748271_r 0 -
chr1 41751073 41752008 uc001cgz.1_exon_1_0_chr1_41751074_r 0 -
chr1 41756659 41756746 uc001cgz.1_exon_2_0_chr1_41756660_r 0 -
chr1 41762992 41763168 uc001cgz.1_exon_3_0_chr1_41762993_r 0 -
chr1 41813801 41813947 uc001cgz.1_exon_4_0_chr1_41813802_r 0 -
chr1 41817994 41823576 uc001cgz.1_exon_5_0_chr1_41817995_r 0 -
chr1 41867006 41867205 uc001cgz.1_exon_6_0_chr1_41867007_r 0 -
chr1 41939173 41939253 uc001cgz.1_exon_7_0_chr1_41939174_r 0 -
chr1 42156670 42156782 uc001cgz.1_exon_8_0_chr1_42156671_r 0 -
Here's how to interpret the name field:
name: uc001cgz.1_exon_0_0_chr1_41748271_r
uc001cgz.1 gene name
exon part of gene (all exon in your case)
0 exon number
0 score
chr1 chromosome
41748271 chromStart
r strand (r = reverse, f = forward)
So, in the example above, there are 9 exons for gene uc001cgz.1.
Because it is on the reverse strand, you want exon 0. Conversely, if it
were on the forward strand, you'd be looking for the exon with the
highest number.
So, for this transcript, the location of the ending exon is:
chr1:41748270-41749524
This should get you well on your way. Be sure to write back to the
list if you get stuck or need more direction.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Rileen wrote:
> Hi,
> Thanks once again for the great resource you provide :-)
>
> I've played around with the "knownAlt" track a bit, and was wondering
> whether there's any simple way of deriving a list of alternative terminal
> exons (ATEs) from it, or any other table/track. By this I mean instances
> where the gene has transcripts ending in different exons, not merely
> different positions in the same exon.
>
> For all the other events in the knownAlt track, the two positions seem
> to provide useful information, but for the altFinish category, they always
> seem to differ by 1.
>
> Why is this so?
>
> I was hoping that given the two positions, one cold check whether they
> were in two
> different exons to derive the list of ATEs as a subset of altFinish,
> but that seems
> wrong.
>
> Looking forward to your reply,
> Yours,
> Rileen
>
More information about the Genome
mailing list