[Genome] Understanding alternative splice data tables

Clancy, Kevin Kevin.Clancy at invitrogen.com
Mon Apr 14 15:03:39 PDT 2008


Hello Ann
Thank you very much for your reply. I appreciate you pointing me to the
Spliced ESTs track - that is very useful indeed. However I am still
interested in understanding better the data in the Sib Alt-Splicing
track as well. So I think I understand part of the Sib Alt-Splice table
based upon the table schema descriptions. Could you please provide an
explanation of the values for the vTypes digits in the table? They seem
to have the values 0,1,2,and 3 but I don't know if there are more and
what the values actually stand for. 

Secondly, could you please provide an explanation of the organization of
the evidence field data. I can see there is some form of {} organized
data there but I don't understand it. 

For the example you provided, the evidence field begins with:
{11,{36,87,88,89,110,111,118,119,120,121,122,},}
And continues on. What do these numbers refer to? At the moment I think
this means for the first edge, there are 11 ESTS that support this start
site and they are elements 36,87,88, etc of the mRNA Refs field. Is the
evidence for the alternative spicing contained in a third level of
bracketing within the evidence field?

Thanks
kevin

Kevin Clancy, PhD
Senior Scientist, Informatic Sciences
Invitrogen Corp
Carlsbad CA 92008
Phone: (240) 379 4401
Cell: (240) 417 8604
Email: kevin.clancy at invitrogen.com 


-----Original Message-----
From: Ann Zweig [mailto:ann at soe.ucsc.edu] 
Sent: Saturday, April 12, 2008 4:09 PM
To: Clancy, Kevin
Cc: 'genome at soe.ucsc.edu'
Subject: Re: [Genome] Understanding alternative splice data tables

Hello Kevin,

	You can read about the details behind a track (description,
methods,
display, credits, references) by pressing on the 'mini-button' to the
left of the actual track display, or by clicking on the hyperlinked
track name in the track controls (below the display).  The track that I
think 
will be the most helpful to you will be the "Spliced ESTs" track.

	The answer to your question about the "sum of the block sizes
adding up to the 
query EST size" is no.  The sum of the block sizes include only the
parts of the 
ESTs that actually align.  Take for example EST AA971065 on the hg18
assembly. 
The first 7 exons of the EST do not align to the genome.  That accounts
for the 
'mismatch': (300 + 45) + 7 = 352.

mysql> select * from chr21_intronEst where qName = 'AA971065'\G
*************************** 1. row ***************************
         bin: 660
     matches: 281
  misMatches: 0
  repMatches: 64
      nCount: 0
  qNumInsert: 0
qBaseInsert: 0
  tNumInsert: 1
tBaseInsert: 1784
      strand: +
       qName: AA971065
       qSize: 352
      qStart: 7
        qEnd: 352
       tName: chr21
       tSize: 46944323
      tStart: 9928611
        tEnd: 9930740
  blockCount: 2
  blockSizes: 300,45,
     qStarts: 7,307,
     tStarts: 9928611,9930695,

	Note in the browser image, the blue vertical line at the
beginning of this EST. 
  This line denotes an "insertion at the beginning or end of the query"
(as 
noted on the track description page).

	It definitely will take some work to extract what you want, but
I think the 
data is there.  You might look at some of the programs in our source
tree that 
make use of the txGraph format.  The Genome Browser and Blat software
are free 
for academic, nonprofit, and personal use. A license is required for
commercial use.

How to download the software:
http://genome.cse.ucsc.edu/FAQ/FAQlicense#license3

You can obtain the source tree either via CVS:
	http://genome.ucsc.edu/admin/cvs.html
or a zip file:
	http://hgdownload.cse.ucsc.edu/admin/jksrc.zip

Please note the build instructions:
	http://genome.ucsc.edu/admin/jk-install.html

	All of the kent utilities output their usage message and command
line options 
by running them with no arguments.


Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu





Clancy, Kevin wrote:
> Dear Sirs
> I am interested in using Tables to extract information on alternative
> spice exons in human chromosome 21. So I would like to have a measure
of
> for each exon seen by aligning ESTs along the genome, how many times
> that exon is present in all the represented genomes and how many times
> it binds to the upstream and downstream exons on either side of it.
> Ideally I would like to use the information from the table to extract
> all these alternative splice products from the genomic sequence with
> some statistics on the prevalence of each exon triplet. Ultimately I'm
> interested in generating a probability based tool to generate
> alternative spice products.
> 
> Looking at the Spliced ESTs/intronEST table, I can see that you have
> nucleotide sizes, start positions and block sizes  in both the query
EST
> and the chromosome. Should the sum of the block sizes add up to the
> query EST size? If not, why is there a difference in the two? 
> 
> Secondly I have looked at the Sib Alt-Slicing/sib TX graph table and
you
> have a network representation of vertices and edges fields
corresponding
> to the EST but I don't quite understand how to use the information you
> have there to tackle my problem. Any help or simple example you can
> point me towards would be very appreciated.
> 
> Finally are these the best tables for me to look at to try and
generate
> this type of information? If not, where would be a better table?
> Thanks kevin
> 
> Kevin Clancy, PhD
> Senior Scientist, Informatic Sciences
> Invitrogen Corp
> Carlsbad, CA 92008
> Phone: (240) 379 4401 x84401
> Cell: (240) 417 8604
> Email: kevin.clancy at invitrogen.com 
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome






More information about the Genome mailing list