[Genome] Understanding alternative splice data tables
Clancy, Kevin
Kevin.Clancy at invitrogen.com
Mon Apr 14 15:03:39 PDT 2008
Hello Ann
Thank you very much for your reply. I appreciate you pointing me to the
Spliced ESTs track - that is very useful indeed. However I am still
interested in understanding better the data in the Sib Alt-Splicing
track as well. So I think I understand part of the Sib Alt-Splice table
based upon the table schema descriptions. Could you please provide an
explanation of the values for the vTypes digits in the table? They seem
to have the values 0,1,2,and 3 but I don't know if there are more and
what the values actually stand for.
Secondly, could you please provide an explanation of the organization of
the evidence field data. I can see there is some form of {} organized
data there but I don't understand it.
For the example you provided, the evidence field begins with:
{11,{36,87,88,89,110,111,118,119,120,121,122,},}
And continues on. What do these numbers refer to? At the moment I think
this means for the first edge, there are 11 ESTS that support this start
site and they are elements 36,87,88, etc of the mRNA Refs field. Is the
evidence for the alternative spicing contained in a third level of
bracketing within the evidence field?
Thanks
kevin
Kevin Clancy, PhD
Senior Scientist, Informatic Sciences
Invitrogen Corp
Carlsbad CA 92008
Phone: (240) 379 4401
Cell: (240) 417 8604
Email: kevin.clancy at invitrogen.com
-----Original Message-----
From: Ann Zweig [mailto:ann at soe.ucsc.edu]
Sent: Saturday, April 12, 2008 4:09 PM
To: Clancy, Kevin
Cc: 'genome at soe.ucsc.edu'
Subject: Re: [Genome] Understanding alternative splice data tables
Hello Kevin,
You can read about the details behind a track (description,
methods,
display, credits, references) by pressing on the 'mini-button' to the
left of the actual track display, or by clicking on the hyperlinked
track name in the track controls (below the display). The track that I
think
will be the most helpful to you will be the "Spliced ESTs" track.
The answer to your question about the "sum of the block sizes
adding up to the
query EST size" is no. The sum of the block sizes include only the
parts of the
ESTs that actually align. Take for example EST AA971065 on the hg18
assembly.
The first 7 exons of the EST do not align to the genome. That accounts
for the
'mismatch': (300 + 45) + 7 = 352.
mysql> select * from chr21_intronEst where qName = 'AA971065'\G
*************************** 1. row ***************************
bin: 660
matches: 281
misMatches: 0
repMatches: 64
nCount: 0
qNumInsert: 0
qBaseInsert: 0
tNumInsert: 1
tBaseInsert: 1784
strand: +
qName: AA971065
qSize: 352
qStart: 7
qEnd: 352
tName: chr21
tSize: 46944323
tStart: 9928611
tEnd: 9930740
blockCount: 2
blockSizes: 300,45,
qStarts: 7,307,
tStarts: 9928611,9930695,
Note in the browser image, the blue vertical line at the
beginning of this EST.
This line denotes an "insertion at the beginning or end of the query"
(as
noted on the track description page).
It definitely will take some work to extract what you want, but
I think the
data is there. You might look at some of the programs in our source
tree that
make use of the txGraph format. The Genome Browser and Blat software
are free
for academic, nonprofit, and personal use. A license is required for
commercial use.
How to download the software:
http://genome.cse.ucsc.edu/FAQ/FAQlicense#license3
You can obtain the source tree either via CVS:
http://genome.ucsc.edu/admin/cvs.html
or a zip file:
http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
Please note the build instructions:
http://genome.ucsc.edu/admin/jk-install.html
All of the kent utilities output their usage message and command
line options
by running them with no arguments.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Clancy, Kevin wrote:
> Dear Sirs
> I am interested in using Tables to extract information on alternative
> spice exons in human chromosome 21. So I would like to have a measure
of
> for each exon seen by aligning ESTs along the genome, how many times
> that exon is present in all the represented genomes and how many times
> it binds to the upstream and downstream exons on either side of it.
> Ideally I would like to use the information from the table to extract
> all these alternative splice products from the genomic sequence with
> some statistics on the prevalence of each exon triplet. Ultimately I'm
> interested in generating a probability based tool to generate
> alternative spice products.
>
> Looking at the Spliced ESTs/intronEST table, I can see that you have
> nucleotide sizes, start positions and block sizes in both the query
EST
> and the chromosome. Should the sum of the block sizes add up to the
> query EST size? If not, why is there a difference in the two?
>
> Secondly I have looked at the Sib Alt-Slicing/sib TX graph table and
you
> have a network representation of vertices and edges fields
corresponding
> to the EST but I don't quite understand how to use the information you
> have there to tackle my problem. Any help or simple example you can
> point me towards would be very appreciated.
>
> Finally are these the best tables for me to look at to try and
generate
> this type of information? If not, where would be a better table?
> Thanks kevin
>
> Kevin Clancy, PhD
> Senior Scientist, Informatic Sciences
> Invitrogen Corp
> Carlsbad, CA 92008
> Phone: (240) 379 4401 x84401
> Cell: (240) 417 8604
> Email: kevin.clancy at invitrogen.com
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list