[Genome] Finding alternative terminal exons .......
Kayla Smith
kayla at soe.ucsc.edu
Mon Sep 10 17:26:36 PDT 2007
Rileen,
There are a couple of things you can do from here:
1. Notice that in the example output section of Ann's message below,
> chr1 41748270 41749524
uc001cgz.1_exon_0_0_chr1_41748271_r$
That "uc001cgz.1" is an identifier in from the knownGene table. You can
connect this name to a gene symbol by using the table hg18.kgXref.
However, this name is just a substring of the name of an exon in your
output, so you'd have to parse the name out yourself. That is
to say the information you need is there, but it's difficult to get at.
2. You can use the Galaxy tool to perform more advanced intersection
operations. Galaxy is a set of tools created and maintained at Penn State
University that works in concert with the UCSC Genome Browser. It is
located here:
http://main.g2.bx.psu.edu/
Galaxy is capable of intersecting two tables and keep identifying
information from both. Use the "Get Data" link on the left hand side of
the page to upload data or to retrieve it from the Genome Browser. You
can also use the "send output to Galaxy" link on your existing custom
track in the Table Browser. The "Operate on Genomic Intervals" link
contains join and intersect tools, which you can use to join information
from both your exon custom track data and from the UCSC Genes (knownGene)
track, and then intersect that with the kgXref track, thereby keeping
information from all tracks in the intersection.
If you have trouble with any of the Galaxy tools, they have a helpdesk as
well:
galaxy-user at bx.psu.edu
I hope this information is helpful to you. Please don't hesitate to
contact us again if you require further assistance.
Kayla Smith
UCSC Genome Bioinformatics Group
Kayla Smith
UCSC Genome Bioinformatics Group
On Sun, 9 Sep 2007, Rileen wrote:
> Hi,
> Thanks for that, I now have the list of all the exons in
> UCSC genes with altFinish entries.
>
> I just need to put this together with a table giving the info on gene
> names/symbols, so that I know all the different transcripts of a
> given gene, and can check for different terminal exons accordingly,
> i.e something similar to the "refFlat" table for RefSeq data.
>
> Thanks once again,
> Yours,
> Rileen
>
> On 07/09/2007, Ann Zweig <ann at soe.ucsc.edu> wrote:
> > Hello Rileen,
> >
> > Thanks for the compliments on the browser.
> >
> > You are correct that you are not going to find what you want directly
> > from the altFinish items in the Alt Events track. However, you can use
> > that track as a starting point to get what you need. Below I outline
> > the steps you will need to take to find the ending exon for each transcript.
> >
> > 1. Make a custom track of the altFinish items from the Alt Events track.
> >
> > Navigate to the Table Browser ('Tables' from the top blue navigation
> > bar) and create a custom track from the Alt Events track. Be sure to
> > configure the filter to include on altFinish items from the table:
> > "name does match altFinish"
> >
> > Read more about creating custom tracks using the table browser here:
> > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#CustomTrack
> >
> >
> > 2. Intersect that custom track with the UCSC Known Gene track (to make a
> > new CT).
> >
> > Again, using the Table Browser, intersect the UCSC Genes track with your
> > custom track from step 1. Create a new custom track. This will be a
> > track containing all UCSC Genes that have overlap with altFinish items.
> >
> > Read more about doing intersections with the Table Browser here:
> > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection
> >
> >
> > 3. Get the exons from the resulting track.
> > Choose the resulting custom track from step 2 in the Table Browser.
> > Choose BED as the output type. From the next page, choose only the
> > "Exons" button. Press 'get BED'. This will give you a list (in BED
> > format) of each exon in each UCSC Gene from the custom track in step 2.
> >
> >
> > 4. Mine the output for the location of the "last" exon.
> > The BED output will have the following columns:
> > chromosome
> > chromStart
> > chromEnd
> > name
> > score
> > strand
> >
> > Here's an example of the 9 exons from one gene in the track:
> >
> > chr1 41748270 41749524 uc001cgz.1_exon_0_0_chr1_41748271_r 0 -
> > chr1 41751073 41752008 uc001cgz.1_exon_1_0_chr1_41751074_r 0 -
> > chr1 41756659 41756746 uc001cgz.1_exon_2_0_chr1_41756660_r 0 -
> > chr1 41762992 41763168 uc001cgz.1_exon_3_0_chr1_41762993_r 0 -
> > chr1 41813801 41813947 uc001cgz.1_exon_4_0_chr1_41813802_r 0 -
> > chr1 41817994 41823576 uc001cgz.1_exon_5_0_chr1_41817995_r 0 -
> > chr1 41867006 41867205 uc001cgz.1_exon_6_0_chr1_41867007_r 0 -
> > chr1 41939173 41939253 uc001cgz.1_exon_7_0_chr1_41939174_r 0 -
> > chr1 42156670 42156782 uc001cgz.1_exon_8_0_chr1_42156671_r 0 -
> >
> > Here's how to interpret the name field:
> >
> > name: uc001cgz.1_exon_0_0_chr1_41748271_r
> >
> > uc001cgz.1 gene name
> > exon part of gene (all exon in your case)
> > 0 exon number
> > 0 score
> > chr1 chromosome
> > 41748271 chromStart
> > r strand (r = reverse, f = forward)
> >
> > So, in the example above, there are 9 exons for gene uc001cgz.1.
> > Because it is on the reverse strand, you want exon 0. Conversely, if it
> > were on the forward strand, you'd be looking for the exon with the
> > highest number.
> >
> > So, for this transcript, the location of the ending exon is:
> > chr1:41748270-41749524
> >
> >
> > This should get you well on your way. Be sure to write back to the
> > list if you get stuck or need more direction.
> >
> >
> > Regards,
> >
> > ----------
> > Ann Zweig
> > UCSC Genome Bioinformatics Group
> > http://genome.ucsc.edu
> >
> >
> >
> > Rileen wrote:
> > > Hi,
> > > Thanks once again for the great resource you provide :-)
> > >
> > > I've played around with the "knownAlt" track a bit, and was wondering
> > > whether there's any simple way of deriving a list of alternative terminal
> > > exons (ATEs) from it, or any other table/track. By this I mean instances
> > > where the gene has transcripts ending in different exons, not merely
> > > different positions in the same exon.
> > >
> > > For all the other events in the knownAlt track, the two positions seem
> > > to provide useful information, but for the altFinish category, they always
> > > seem to differ by 1.
> > >
> > > Why is this so?
> > >
> > > I was hoping that given the two positions, one cold check whether they
> > > were in two
> > > different exons to derive the list of ATEs as a subset of altFinish,
> > > but that seems
> > > wrong.
> > >
> > > Looking forward to your reply,
> > > Yours,
> > > Rileen
> > >
> >
>
>
> --
> ******************************************************************
> "I know nothing, but i _know_ that."
>
> Rileen Sinha rileen at yahoo.com
> Personal Phone : (0049)3641412276 (cheaper to call)
> (0049)17624078373
> ******************************************************************
>
>
> --
> ******************************************************************
> "I know nothing, but i _know_ that."
>
> Rileen Sinha rileen at yahoo.com
> Personal Phone : (0049)3641412276 (cheaper to call)
> (0049)17624078373
> ******************************************************************
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list