[Genome] problem on mouse genome for promoter region prediction
Brooke Rhead
rhead at soe.ucsc.edu
Mon Nov 26 12:16:34 PST 2007
Hello again Xu Jerry,
There has been some further discussion among our engineers about your
question, which I would like to pass along to you.
First, it was pointed out that turnover in these regions is to be
expected. Here is one paper on the topic:
Evolutionary turnover of mammalian transcription start sites.
Genome Res. 2006 Jun;16(6):713-22. Epub 2006 May 10.
http://www.genome.org/cgi/content/abstract/16/6/713
Also, here is some input regarding liftOver vs. pslMap, and more on
turnover:
~~~
pslMap would be a lot better for this application. LiftOver might also
just drop some promoters too, even though a few elements may be
conserved. Generally you do get a fair bit of turnover in regulatory
elements and transcription start sites between human and mouse. You
don't expect to see the level of conservation you'd get with coding
regions, even though the functionality may be as complex, and also
conserved. Regulatory protein binding elements can phase in and out of
existence pretty easily, and one phasing out can be compensated for by
another phasing in. The 12-fly paper (see Kellis M, Kent WJ on pubMed)
has some info on this too.
~~~
The paper referenced above is:
Discovery of functional elements in 12 Drosophila genomes using
evolutionary signatures. Nature. 2007 Nov 8;450(7167):219-32
I hope this is helpful.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Brooke Rhead wrote:
> Hello Xu Jerry,
>
> Here is a similar previously-answered question:
>
> http://www.soe.ucsc.edu/pipermail/genome/2007-November/014963.html
>
> (This is specific to TFBS, but you could also use the liftOver utility
> for the Eponine TSS or FirstEF tracks. However, please see my note
> below about using liftOver -- you might want to use pslMap instead.)
>
> Alternatively, you could intersect one of the human promoter tracks with
> the human Conservation track to get the corresponding mouse regions, as
> described in this answer:
>
> http://www.soe.ucsc.edu/pipermail/genome/2007-June/013989.html
>
> (Note that multiz17way has been replaced by multiz28way in hg18. Also,
> there is another tool run by Penn State University called Galaxy that is
> very useful for working with MAF data, located here:
> http://main.g2.bx.psu.edu/ -- look under the "Fetch Alignments" tool on
> the left-hand side of the page.)
>
> In either case, you could make a custom track in the mouse browser and
> intersect it with the mouse regions you found.
>
> Another option would be to go in the opposite direction: use your
> compiled list of mouse coordinates to create a custom track in the human
> browser (either using liftOver, pslMap or the Conservation track to
> convert to human coordinates), then intersect that with one of the
> promoter tracks on the human browser.
>
>
> NOTE about using liftOver:
>
> liftOver is fairly coarse (just maps start and end), which makes it more
> useful for same-species mapping than cross-species mapping. The program
> pslMap is much more detailed (it breaks the mapping down to the
> chainLink/gapless-block level), so it is better than liftOver for
> mapping gene-sized regions.
>
> The pslMap program is available in our source tree, which is free for
> academic, nonprofit and personal use:
> http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads
>
> Typical pslMap options for running with one of our .over.chain files
> (for example, converting mm8 to hg18 coordinates) are:
>
> pslMap -chainMapFile -swapMap mm8.input.psl mm8ToHg18.over.chain.gz
> hg18.output.psl
>
> Note that the input to pslMap needs to be in PSL format, described here:
> http://genome.ucsc.edu/FAQ/FAQformat#format2 . If you already have
> coordinates in BED format, our program genePredToPsl -bedFormat can be used.
>
> The output of pslMap can be uploaded as a custom track, and/or
> translated back into BED format using the program pslToBed.
>
> I hope this information is helpful. If you have further questions,
> please feel free to contact us again at the genome mailing list address.
>
More information about the Genome
mailing list