[Genome] Understanding alternative splice data tables

Clancy, Kevin Kevin.Clancy at invitrogen.com
Mon Apr 21 16:26:56 PDT 2008


Brooke
Thank you very much for your reply. Both the paper and the previous
reply were most helpful. Looking at the paper I see a link to the Kent
lab source tree. Would it be possible to access this source tree purely
to audit the code for the altSplice and orthoSplice programs? If so, do
I need any specific account information?
Thanks 
kevin


Kevin Clancy
Phone:  x84401
Cell: (240) 417 8604

-----Original Message-----
From: Brooke Rhead [mailto:rhead at soe.ucsc.edu] 
Sent: Wednesday, April 16, 2008 4:31 PM
To: Clancy, Kevin
Cc: genome at soe.ucsc.edu
Subject: Re: [Genome] Understanding alternative splice data tables

Hello Kevin,

The developer here who created the Sib Alt-Splice track is unavailable 
at the moment, but maybe I can help you find the information you need. 
The data for the track were contributed by Christian Iseli 
(Christian.Iseli at licr.org); web site: 
http://www.isrec.isb-sib.ch/tromer/ .  I suggest contacting him for an 
explanation of the Sib Alt-Splice data.

Another resource that might be helpful in understanding our table format

is a very similar track called "Alt-Splicing" (available on the hg17, 
Human May 2004 browser).  Here is the paper linked to on the details 
page for that track:

Sugnet, C.W. et al., Transcriptome and genome conservation of 
alternative splicing events in humans and mice. Pacific Symposium on 
Biocomputing (PSB) 2004 Online Proceedings.
http://helix-web.stanford.edu/psb04/sugnet.pdf

There is also a bit of relevant information in this previously-answered 
mailing list question:

http://www.soe.ucsc.edu/pipermail/genome/2005-September/008530.html

I hope this information helps.  If I get a response from the developer 
here who helped create this track, I will send you more information.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



Clancy, Kevin wrote:
> Hello Ann
> Thank you very much for your reply. I appreciate you pointing me to
the
> Spliced ESTs track - that is very useful indeed. However I am still
> interested in understanding better the data in the Sib Alt-Splicing
> track as well. So I think I understand part of the Sib Alt-Splice
table
> based upon the table schema descriptions. Could you please provide an
> explanation of the values for the vTypes digits in the table? They
seem
> to have the values 0,1,2,and 3 but I don't know if there are more and
> what the values actually stand for. 
> 
> Secondly, could you please provide an explanation of the organization
of
> the evidence field data. I can see there is some form of {} organized
> data there but I don't understand it. 
> 
> For the example you provided, the evidence field begins with:
> {11,{36,87,88,89,110,111,118,119,120,121,122,},}
> And continues on. What do these numbers refer to? At the moment I
think
> this means for the first edge, there are 11 ESTS that support this
start
> site and they are elements 36,87,88, etc of the mRNA Refs field. Is
the
> evidence for the alternative spicing contained in a third level of
> bracketing within the evidence field?
> 
> Thanks
> kevin
> 
> Kevin Clancy, PhD
> Senior Scientist, Informatic Sciences
> Invitrogen Corp
> Carlsbad CA 92008
> Phone: (240) 379 4401
> Cell: (240) 417 8604
> Email: kevin.clancy at invitrogen.com 
> 
> 
> -----Original Message-----
> From: Ann Zweig [mailto:ann at soe.ucsc.edu] 
> Sent: Saturday, April 12, 2008 4:09 PM
> To: Clancy, Kevin
> Cc: 'genome at soe.ucsc.edu'
> Subject: Re: [Genome] Understanding alternative splice data tables
> 
> Hello Kevin,
> 
> 	You can read about the details behind a track (description,
> methods,
> display, credits, references) by pressing on the 'mini-button' to the
> left of the actual track display, or by clicking on the hyperlinked
> track name in the track controls (below the display).  The track that
I
> think 
> will be the most helpful to you will be the "Spliced ESTs" track.
> 
> 	The answer to your question about the "sum of the block sizes
> adding up to the 
> query EST size" is no.  The sum of the block sizes include only the
> parts of the 
> ESTs that actually align.  Take for example EST AA971065 on the hg18
> assembly. 
> The first 7 exons of the EST do not align to the genome.  That
accounts
> for the 
> 'mismatch': (300 + 45) + 7 = 352.
> 
> mysql> select * from chr21_intronEst where qName = 'AA971065'\G
> *************************** 1. row ***************************
>          bin: 660
>      matches: 281
>   misMatches: 0
>   repMatches: 64
>       nCount: 0
>   qNumInsert: 0
> qBaseInsert: 0
>   tNumInsert: 1
> tBaseInsert: 1784
>       strand: +
>        qName: AA971065
>        qSize: 352
>       qStart: 7
>         qEnd: 352
>        tName: chr21
>        tSize: 46944323
>       tStart: 9928611
>         tEnd: 9930740
>   blockCount: 2
>   blockSizes: 300,45,
>      qStarts: 7,307,
>      tStarts: 9928611,9930695,
> 
> 	Note in the browser image, the blue vertical line at the
> beginning of this EST. 
>   This line denotes an "insertion at the beginning or end of the
query"
> (as 
> noted on the track description page).
> 
> 	It definitely will take some work to extract what you want, but
> I think the 
> data is there.  You might look at some of the programs in our source
> tree that 
> make use of the txGraph format.  The Genome Browser and Blat software
> are free 
> for academic, nonprofit, and personal use. A license is required for
> commercial use.
> 
> How to download the software:
> http://genome.cse.ucsc.edu/FAQ/FAQlicense#license3
> 
> You can obtain the source tree either via CVS:
> 	http://genome.ucsc.edu/admin/cvs.html
> or a zip file:
> 	http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
> 
> Please note the build instructions:
> 	http://genome.ucsc.edu/admin/jk-install.html
> 
> 	All of the kent utilities output their usage message and command
> line options 
> by running them with no arguments.
> 
> 
> Regards,
> 
> ----------
> Ann Zweig
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
> 
> 
> 
> 
> 
> Clancy, Kevin wrote:
>> Dear Sirs
>> I am interested in using Tables to extract information on alternative
>> spice exons in human chromosome 21. So I would like to have a measure
> of
>> for each exon seen by aligning ESTs along the genome, how many times
>> that exon is present in all the represented genomes and how many
times
>> it binds to the upstream and downstream exons on either side of it.
>> Ideally I would like to use the information from the table to extract
>> all these alternative splice products from the genomic sequence with
>> some statistics on the prevalence of each exon triplet. Ultimately
I'm
>> interested in generating a probability based tool to generate
>> alternative spice products.
>>
>> Looking at the Spliced ESTs/intronEST table, I can see that you have
>> nucleotide sizes, start positions and block sizes  in both the query
> EST
>> and the chromosome. Should the sum of the block sizes add up to the
>> query EST size? If not, why is there a difference in the two? 
>>
>> Secondly I have looked at the Sib Alt-Slicing/sib TX graph table and
> you
>> have a network representation of vertices and edges fields
> corresponding
>> to the EST but I don't quite understand how to use the information
you
>> have there to tackle my problem. Any help or simple example you can
>> point me towards would be very appreciated.
>>
>> Finally are these the best tables for me to look at to try and
> generate
>> this type of information? If not, where would be a better table?
>> Thanks kevin
>>
>> Kevin Clancy, PhD
>> Senior Scientist, Informatic Sciences
>> Invitrogen Corp
>> Carlsbad, CA 92008
>> Phone: (240) 379 4401 x84401
>> Cell: (240) 417 8604
>> Email: kevin.clancy at invitrogen.com 
>>
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
> 
> 
> 
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome





More information about the Genome mailing list