[Genome] Finding alternative terminal exons .......

Ann Zweig ann at soe.ucsc.edu
Fri Sep 7 13:27:48 PDT 2007


Hello Rileen,

	Thanks for the compliments on the browser.

	You are correct that you are not going to find what you want directly 
from the altFinish items in the Alt Events track.  However, you can use 
that track as a starting point to get what you need.  Below I outline 
the steps you will need to take to find the ending exon for each transcript.

1. Make a custom track of the altFinish items from the Alt Events track.

Navigate to the Table Browser ('Tables' from the top blue navigation 
bar) and create a custom track from the Alt Events track.  Be sure to 
configure the filter to include on altFinish items from the table:
"name does match altFinish"

Read more about creating custom tracks using the table browser here:
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#CustomTrack


2. Intersect that custom track with the UCSC Known Gene track (to make a 
new CT).

Again, using the Table Browser, intersect the UCSC Genes track with your 
custom track from step 1.  Create a new custom track.  This will be a 
track containing all UCSC Genes that have overlap with altFinish items.

Read more about doing intersections with the Table Browser here:
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection


3. Get the exons from the resulting track.
Choose the resulting custom track from step 2 in the Table Browser. 
Choose BED as the output type.  From the next page, choose only the 
"Exons" button.  Press 'get BED'.  This will give you a list (in BED 
format) of each exon in each UCSC Gene from the custom track in step 2.


4. Mine the output for the location of the "last" exon.
The BED output will have the following columns:
chromosome
chromStart
chromEnd
name
score
strand

Here's an example of the 9 exons from one gene in the track:

chr1	41748270	41749524	uc001cgz.1_exon_0_0_chr1_41748271_r	0	-
chr1	41751073	41752008	uc001cgz.1_exon_1_0_chr1_41751074_r	0	-
chr1	41756659	41756746	uc001cgz.1_exon_2_0_chr1_41756660_r	0	-
chr1	41762992	41763168	uc001cgz.1_exon_3_0_chr1_41762993_r	0	-
chr1	41813801	41813947	uc001cgz.1_exon_4_0_chr1_41813802_r	0	-
chr1	41817994	41823576	uc001cgz.1_exon_5_0_chr1_41817995_r	0	-
chr1	41867006	41867205	uc001cgz.1_exon_6_0_chr1_41867007_r	0	-
chr1	41939173	41939253	uc001cgz.1_exon_7_0_chr1_41939174_r	0	-
chr1	42156670	42156782	uc001cgz.1_exon_8_0_chr1_42156671_r	0	-

	Here's how to interpret the name field:

name: uc001cgz.1_exon_0_0_chr1_41748271_r

uc001cgz.1	gene name
exon		part of gene (all exon in your case)
0		exon number
0		score
chr1		chromosome
41748271	chromStart
r		strand (r = reverse, f = forward)

	So, in the example above, there are 9 exons for gene uc001cgz.1. 
Because it is on the reverse strand, you want exon 0.  Conversely, if it 
were on the forward strand, you'd be looking for the exon with the 
highest number.

	So, for this transcript, the location of the ending exon is: 
chr1:41748270-41749524


	This should get you well on your way.  Be sure to write back to the 
list if you get stuck or need more direction.


Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu



Rileen wrote:
> Hi,
>       Thanks once again for the great resource you provide :-)
> 
> I've played around with the "knownAlt" track a bit, and was wondering
> whether there's any simple way of deriving a list of alternative terminal
> exons (ATEs) from it, or any other table/track. By this I mean instances
> where the gene has transcripts ending in different exons, not merely
> different positions in the same exon.
> 
> For all the other events in the knownAlt track, the two positions seem
> to provide useful information, but for the altFinish category, they always
> seem to differ by 1.
> 
> Why is this so?
> 
> I was hoping that given the two positions, one cold check whether they
> were in two
> different exons to derive the list of ATEs as a subset of altFinish,
> but that seems
> wrong.
> 
> Looking forward to your reply,
>                                               Yours,
>                                                              Rileen
> 


More information about the Genome mailing list