[Genome] problem on mouse genome for promoter region prediction
Brooke Rhead
rhead at soe.ucsc.edu
Thu Nov 29 14:33:31 PST 2007
Hi Xu Jerry,
The surest way to evaluate the differences among the three methods is to
try each method and compare the results.
However, here are some general guidelines from one of our developers
that should at least give you an idea of the kinds of differences you
could expect:
1. liftOver is the coarsest -- it translates the start and end without
showing the structure of the chain used to do the alignment.
2. pslMap shows more detail -- it translates a region using each block
of the chain, introducing gaps where the chain has gaps.
3. Conservation (multiz) uses sequence from the same netted chains.
Occasionally (I have no idea how often), an alignment to the reference
that is broken up may be joined using a different species' alignment
to the reference (e.g. when the two aligned species are more similar
to each other than to the reference). I'm not sure how big of a
difference it makes, but theoretically it is better.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
xu_jianzhen wrote:
> hello,Brooke,
> first, I'll thank you and all the other members of UCSC team for
> providing me very useful information.After carefully reading your
> response,I decided to analyse my region with the help of Galaxy.
> i intersect one of the human promoter tracks with the human 28-way
> conservation track to get the corresponding mouse region, then interact
> with my own region.It works well.
> Secondly,there is still one question puzzled me.As there are 3 different
> approaches(LiftOver,pslMap, and through conservation track as i did)
> to do the cross species annotation,are there any significant
> difference among the results? especially the results comes from the
> interacton with conservation track and the ones from pslMap.
> I hope you can explain this issue.
>
>
> Regards,
>
> Xu Jerry
>
>
> > Hello again Xu Jerry,
> >
> > There has been some further discussion among our engineers about your
> > question, which I would like to pass along to you.
> >
> > First, it was pointed out that turnover in these regions is to be
> > expected. Here is one paper on the topic:
> >
> > Evolutionary turnover of mammalian transcription start sites.
> > Genome Res. 2006 Jun;16(6):713-22. Epub 2006 May 10.
> > http://www.genome.org/cgi/content/abstract/16/6/713
> >
> > Also, here is some input regarding liftOver vs. pslMap, and more on
> > turnover:
> > ~~~
> > pslMap would be a lot better for this application. LiftOver might also
> > just drop some promoters too, even though a few elements may be
> > conserved. Generally you do get a fair bit of turnover in regulatory
> > elements and transcription start sites between human and mouse. You
> > don't expect to see the level of conservation you'd get with coding
> > regions, even though the functionality may be as complex, and also
> > conserved. Regulatory protein binding elements can phase in and out of
> > existence pretty easily, and one phasing out can be compensated for by
> > another phasing in. The 12-fly paper (see Kellis M, Kent WJ on pubMed)
> > has some info on this too.
> > ~~~
> >
> > The paper referenced above is:
> >
> > Discovery of functional elements in 12 Drosophila genomes using
> > evolutionary signatures. Nature. 2007 Nov 8;450(7167):219-32
> >
> > I hope this is helpful.
> >
> > --
> > Brooke Rhead
> > UCSC Genome Bioinformatics Group
> >
> >
> > Brooke Rhead wrote:
> > > Hello Xu Jerry,
> > >
> > > Here is a similar previously-answered question:
> > >
> > > http://www.soe.ucsc.edu/pipermail/genome/2007-November/014963.html
> > >
> > > (This is specific to TFBS, but you could also use the liftOver utility
>
> > > for the Eponine TSS or FirstEF tracks. However, please see my note
> > > below about using liftOver -- you might want to use pslMap instead.)
> > >
> > > Alternatively, you could intersect one of the human promoter tracks with
>
> > > the human Conservation track to get the corresponding mouse regions, as
>
> > > described in this answer:
> > >
> > > http://www.soe.ucsc.edu/pipermail/genome/2007-June/013989.html
> > >
> > > (Note that multiz17way has been replaced by multiz28way in hg18. Also,
>
> > > there is another tool run by Penn State University called Galaxy that is
>
> > > very useful for working with MAF data, located here:
> > > http://main.g2.bx.psu.edu/ -- look under the "Fetch Alignments" tool on
>
> > > the left-hand side of the page.)
> > >
> > > In either case, you could make a custom track in the mouse browser and
>
> > > intersect it with the mouse regions you found.
> > >
> > > Another option would be to go in the opposite direction: use your
> > > compiled list of mouse coordinates to create a custom track in the human
>
> > > browser (either using liftOver, pslMap or the Conservation track to
> > > convert to human coordinates), then intersect that with one of the
> > > promoter tracks on the human browser.
> > >
> > >
> > > NOTE about using liftOver:
> > >
> > > liftOver is fairly coarse (just maps start and end), which makes it more
>
> > > useful for same-species mapping than cross-species mapping. The program
>
> > > pslMap is much more detailed (it breaks the mapping down to the
> > > chainLink/gapless-block level), so it is better than liftOver for
> > > mapping gene-sized regions.
> > >
> > > The pslMap program is available in our source tree, which is free for
> > > academic, nonprofit and personal use:
> > > http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads
> > >
> > > Typical pslMap options for running with one of our .over.chain files
> > > (for example, converting mm8 to hg18 coordinates) are:
> > >
> > > pslMap -chainMapFile -swapMap mm8.input.psl mm8ToHg18.over.chain.gz
> > > hg18.output.psl
> > >
> > > Note that the input to pslMap needs to be in PSL format, described here:
>
> > > http://genome.ucsc.edu/FAQ/FAQformat#format2 . If you already have
> > > coordinates in BED format, our program genePredToPsl -bedFormat can be used.
>
> > >
> > > The output of pslMap can be uploaded as a custom track, and/or
> > > translated back into BED format using the program pslToBed.
> > >
> > > I hope this information is helpful. If you have further questions,
> > > please feel free to contact us again at the genome mailing list address.
>
> > >
> >
>
>
>
More information about the Genome
mailing list