[Genome] [Fwd: intron question]
Rachel Harte
hartera at soe.ucsc.edu
Tue Nov 14 17:04:52 PST 2006
Eric,
Two of our developers just suggested alternative and easier ways to get
the sizes of introns. Both these methods involve using the Table
Browser which may be reached by clicking on the Tables link on the top
blue menu bar of the Browser pages. For the Table Browser, you need to
select the organism, assembly, track and table of interest. Make sure
that the table has exons defined in it. Then select genome as the
region. Then do either:
1) Select sequence as the output format and then press "get
output". Select genomic and press "submit". Select only Introns in the
Sequence Retrieval Options and press "get sequence". There is a program
available in the Genome Browser source code called faCount that you can
used to find the sizes of the introns. The source code is freely
available for academic, non-profit and personal use from here:
http://genome.ucsc.edu/FAQ/FAQlicense#license3
faCount is in the directory: src/utils/faCount/.
2) Select BED as the output from the Table Browser and press "get output".
Select Introns and press "get BED". The resulting output gives the
chromosome, intron start and intron end in the first three columns. You
could then use awk to get the intron sizes from this output.
I hope that this helps you.
Rachel
Rachel Harte UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
On Tue, 14 Nov 2006, Rachel Harte wrote:
> Eric,
> It is not possible to get the intron sizes directly from the Browser. In
> order to do this, you will need to select a table that contains a gene set
> that you wish to use for each organism e.g. knownGene which contains
> genes from the Known Genes track on the human, rat and mouse Browser. The
> RefSeq Genes are a track that is available for most organisms. Both of
> these tables have a columns, exonStarts and exonEnds, that are comma
> separated lists of the exon starts and ends. You would need to write
> a program to parse this information and get the intron sizes by
> subtracting the first item in exonStarts from the first item in
> exonEnds, then subtracting the second item in exonStarts from the second
> item in exonEnds etc. All start positions in the tables are on a 0-based
> scale so base 1 is represented as 0.
>
> The contents of all the tables in our databases are downloadable from our
> Downloads server:
>
> http://hgdownload.cse.ucsc.edu/downloads.html
>
> If you click on an organism link, there is a list of assemblies. Each
> assembly has an Annotation database link which will lead to the site from
> which you can download the contents of tables for that assembly.
>
> I hope that this helps you. Please let us know if you have further
> questions. In future, please direct questions to the genome mailins list
> at genome at soe.ucsc.edu. Thank you.
>
> Rachel
>
> Rachel Harte UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
> > -------- Original Message --------
> > Subject: intron question
> > Date: Tue, 14 Nov 2006 13:46:03 -0500
> > From: Eric Lai <laie at mskcc.org>
> > To: cbseweb at cbse.ucsc.edu
> >
> >
> >
> > hi,
> >
> > i'm not sure if this goes to anyone in particular.
> >
> > we would like to extract the introns of a particular size range from all
> > the sequenced species.
> > is there a way to do that for the species that are loaded at UCSC,
> > or would you have any advice on how to obtain these datasets?
> >
> > thanks,
> > .eric
> >
> >
> >
> > ***************************
> > Eric Lai
> > Assistant Member, Sloan-Kettering Institute
> > 521 Rockefeller Research Labs
> > 1275 York Avenue, Box 252
> > New York, NY 10021
> >
> > ph: 212-639-5578
> > fax: 212 717-3604
> > site: http://www.mskcc.org/lai
> >
> >
> >
> >
> > =====================================================================
> >
> > Please note that this e-mail and any files transmitted with it may be
> > privileged, confidential, and protected from disclosure under
> > applicable law. If the reader of this message is not the intended
> > recipient, or an employee or agent responsible for delivering this
> > message to the intended recipient, you are hereby notified that any
> > reading, dissemination, distribution, copying, or other use of this
> > communication or any of its attachments is strictly prohibited. If
> > you have received this communication in error, please notify the
> > sender immediately by replying to this message and deleting this
> > message, any attachments, and all copies and backups from your
> > computer.
> >
> >
> > --
> > Branwyn Stewart Wagman
> > Communications & Human Resources
> > Center for Biomolecular Science and Engineering (CBSE)
> > Institute for Quantitative Biomedical Research (QB3)
> > 501C Engineering 2 Building
> > UC Santa Cruz
> > 1156 High Street, MS: CBSE/ITI
> > Santa Cruz CA 95064
> > Tel: (831) 459-3077
> > Fax: (831) 459-1809
> > bwagman at soe.ucsc.edu
> > http://www.cbse.ucsc.edu
> >
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list