[Genome] Drosophila phylogenetic analysis

Rachel Harte hartera at soe.ucsc.edu
Wed Jan 3 16:54:49 PST 2007


Nicola,

If you would like to find homologs, there are several ways that you could
do this. In some of the fly Genome Browsers, we have a D. mel. Proteins
track which aligns the Drosophila melanogaster proteins to the genome.
However, not all the fly Browsers have this track. For 7 of the fly
species, we do not have complete browsers yet, only the alignments to
D. melanogaster. Therefore, using the net alignment track would also not
work for all species. The net track shows the best in genome alignment of
the fly species to the genome in question. An intersection of fly genes with
the net track for each species would help you to find homologs but the net
track is not available for all Drosophila species.

The D. melanogaster conservation track (multiz15way) has multiple
alignments for all of the Drosophila species. So one possibility for
finding homologs is to do an intersection of the CDS regions of your genes of
interest with the multiz15way conservation track.

You can do this through the UCSC Table Browser but when you get the results
of an intersection it only shows identifiers for the first table and not the
second. I can show you how to do this with the Table Browser. I think that
Galaxy 2 could be used to do this intersection and keep identifiers from both
tables. Galaxy 2 adds data manipulation tools on top of the UCSC Table
Browser and it is produced by a group at Penn State University. Go to:

http://www.bx.psu.edu/cgi-bin/trac.cgi

then click on the Galaxy 2 link.

Using the UCSC Table Browser, click on the Tables link on the top blue
bar. First you will need to get the coding exons for the genes that you
are interested in:

1) Select: clade: insect genome: D. melanogaster and the assembly of
interest.

2) Select Genes and Gene Prediction Tracks as the group and FlyBase Genes
as the track.

3) Make sure that you select genome as the region.

4) Then you can click on "paste list" or "upload list" to add a list of
gene names.

5) Select "custom track" as the output format and "get output" and select
"Coding Exons" and press the "get custom track in table browser" option.
You could just use Whole Gene too but coding regions will be more
conserved.
If you are unfamiliar with the BED format, here is a link to explain it:
http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED

Then you will need to intsect this with the conservation track data:

1) Select clade: Insect genome: D. melanogaster and the assembly of
interest.

2) Select Comparative Genomics as the group, Conservation as the track and
multiz15way as the table.

3) Press the "create" button for intersection and select "Custom Tracks"
as the group and select your custom track. Choose the base-pairwise AND
for the intersection or you can select for regions of the multiple
alignment that have a certain % overlap with the FlyBase genes.

4) Select MAF - multiple alignment format as the output format
and press the "get output" button.

This will give you just the regions of the multiple alignment that
intersect with FlyBase coding exons. You can extract the coordinates for
each region for each species. It will not tell you which genes intersected
with each reason. As I mentioned, Galaxy 2 may be able to do give you both
the multiple alignment region and the D. melanogaster FlyBase gene name.

There are tutorials on using Galaxy 2 on the home page. If you have
any problems with it or further questions, please contact the group that
created Galaxy 2. They have mailing lists and contact details on the
bottom of the Galaxy home page.

I hope that this helps answers your question. Please let us know if you
have further questions.

Rachel

Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


On Sat, 30 Dec 2006, Neretti, Nicola wrote:

> To Whom It May Concern:
>
>
>
> I am running a response element analysis on sets of genes from
> Drosophila melanogaster. I would like to incorporate a phylogenetic
> component by including sequences from all the other Drosophila species
> available in the genome browser. However I noticed that searching
> through a gene ID only returns hits from melanogaster.
>
>
>
> How can I extract homologs from all the other species?
>
>
>
> Also, is there a way to do it in batch mode?
>
>
>
> Thanks a lot for your help.
>
>
>
>
>
>
>
> Best Regards,
>
>
>
> -Nicola Neretti
>
>
>
>
>
>
>
>
>
> Nicola Neretti
>
> Institute for Brain and Neural Systems
>
> Brown University
>
> Providence, RI 02912
>
> T. (401) 863-2187
>
> F. (401) 863-3494
>
> e-mail: nicola_neretti at brown.edu
>
>
>
>
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list