[Genome] 12 drosophila genome intron alignments

Vanessa Bauer vlb2 at cornell.edu
Mon Dec 11 08:50:44 PST 2006


Hello,

Below you will find a few e-mails that I have had with Angie Hinrichs 
regarding intron alignments.  She was very helpful and I now have a 
better understanding of how your data is organized.  That being said, 
I still have questions.  Basically, what we would like to have are 
intron alignments specifically for the species in the melanogaster 
subgroup.  With Angie's help I can not get such alignments directly 
but they are based on 15 species.  Will this led to a bias toward 
more conserved sequence being available.  In other words, would there 
be more aligned intron nucleotides if the alignments were based on a 
smaller more closely related set of species?

thanks, Tessa Bauer DuMont





Hi Tessa,

I'm not sure if this will get you exactly what you're looking for, but
hopefully it will be a start: our Table Browser tool can make a track
of introns, and it can "intersect" (find the overlap between) tracks.
So it's possible to extract multi-species alignments from intronic
regions.

First, to make a custom track containing the introns of FlyBase genes:

0. Start on our test server, genome-test.cse.ucsc.edu, since the main
    server does not yet have alignments that include sechellia.

1. Open the Table Browser (click the Tables link in the top blue bar).

2. Make the following selections on the Table Browser main page:
    clade: insect
    genome: D. melanogaster
    assembly: Apr. 2004
    group: Genes and Gene Predictions
    track: FlyBase Genes
    region: genome
    output format: custom track
    [If an intersection or filter was defined previously, clear it]
    Click "get output".

3. If you like, you can change the name or description to something
    more descriptive than the defaults, like "flyBaseIntrons".
    Where it says " Create one BED record per:", choose Introns.
    Click "get custom track in Table Browser".

Now, to get MAF alignments in all intronic regions:

4. Make the following selections on the Table Browser main page:
    group: Comparative Genomics
    track: Conservation (15way)
    table: multiz15way
    output format: MAF
    intersection: Click "create".

5. Select your intron custom track as the secondary table:
    group: Custom Tracks
    track: flyBaseIntrons (or whatever name was used in step 3).
    Choose "Base-pair-wise intersection (AND) of Conservation and ..."
    as the method of finding overlap.
    Click submit.

6. Back on the Table Browser main page, click "get output".
    [You may want to download directly to a file -- in that case,
    before clicking "get output", enter a file name into the "output
    file" box.]

That will return multi-species alignments for all intronic regions, in
the MAF format (see http://genome.ucsc.edu/FAQ/FAQformat#format5 for a
format description).  If you are interested in only a subset of the
species, you can use "grep -v" or a similar program to remove the
species that you don't want.

Hope that helps!

There is a really great email list to which questions about our data
and programs can be sent: genome at cse.ucsc.edu.  Several of my
colleagues rotate responsibilities for answering questions sent there,
and are very diligent about responding to questions within a day.
Soon I will be taking some extended leave, and during that time you'll
most definitely hear back more promptly from them than from me!  :)

Angie [not a Dr. but thanks]


On Wed, 6 Dec 2006, Vanessa Bauer wrote:

>  Hello Dr Hinrichs
>
>  My name is Tessa Bauer DuMont and I work with Prof. Chip Aquadro.  We are
>  working on a companion paper as part of the analysis of the 12 Drosophila
>  genomes.  Although it is not crucial for the completion of this paper, we
>  would like to obtain intron alignments associated with each of the coding
>  regions for which independent gene by gene alignments already exist.  Our
>  analysis focuses on the closely related species D.mel, D.sec and D.yak.   At
>  this point the whole genome alignments are as close as I have gotten to what
>  we want (an alignment of the introns for each gene), but it is not 
>clear to me
>  how one would progress from here.  Do you know of any other groups that may
>  have already extracted the alignments for the introns, or do you have any
>  suggestions as to the easiest way to pull the introns out of the database.
>
>  Thank-you for your consideration,
>  Tessa
>

-- 
angie at soe.ucsc.edu
Software Developer, UCSC CBSE / Genome Bioinformatics Group
-- 
  "If a nation expects to be free and ignorant--it expects what never 
was and never will be."
-- Thomas Jefferson


More information about the Genome mailing list