[Genome] Questions about the conservation tracks

Kate Rosenbloom kate at soe.ucsc.edu
Mon Aug 6 11:52:41 PDT 2007


Hello Jorma,

Regarding the multiple alignment display -- the green graphs
do represent pairwise alignment quality, but are
not phastCons scores.  They are computed on-the-fly from
pairwise alignments extracted from the multiple alignment,
using the multiz scoring aligorithm.

The parameter settings for the 28way are all available from
the track description and downloads area.  Specifically:

The model file, which contains the background frequencies,
substitution matrix, phylogenetic tree and branch lengths, is here:
   http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons28way/28way.mod

The other parameters (listed in the track description) are:

expected-length=45, target-coverage=.3, rho=.31

These are not the same parameters used in the Genome Research paper --
we generate new parameters with each alignment to approximate
5% coverage of conserved elements over the human genome, with 70%
coverage of coding sequence (as reported by our 'featureBits' utility).

For the 17way, the parameters were:
expected-length=14, target-coverage=.008, rho=.28
(this will be added to the track description),
and the model file (same as that used for ENCODE alignments):
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons17way/elliotsEncode.mod

Our methods for generating phastCons parameters for
'deep' alignments are based on recommendations in
the "Working with Large Data Sets" section of
Adam Siepel's phastCons 'HOWTO':
    http://compgen.bscb.cornell.edu/~acs/phastCons-HOWTO.html
We begin with the last set used (e.g. for the 28way, I started
with the 17way parameters).  The resulting coverage is assessed,
and the target-coverage is adjusted to meet the criteria above.
The number of conserved elements and smoothness of the conservation
graph are then used to adjust the expected-length parameter.

Regarding the region you asked about 
(hg18.chr9:125,250,933-125,256,5300) -- it appears that the conservation
peak between the second and third elements may not be above threshold
for selection of conserved elements -- I suggest you contact Adam
for a more authoritative interpretation.

Hope this helps!

   Kate
---
Kate Rosenbloom
UCSC Genome Bioinformatics







JJ de Ronde wrote:
> Dear reader,
> 
> We are using the conservation scores and most-conserved tracks you
> supply in the genome browser for various purposes and are now trying to
> get a better understanding of what we are using exactly. To this purpose
> we would also like to replicate the results we get in the genome browser
> using the phastcons software. Adam Siepel redirected us to you regarding
> the following and we would be grateful if you could answer a couple of
> questions for us about the (parameter) settings that you are using.
> 
> - First of all: which parameter settings are you using for the 17-way
> and 28-way conservation tracks (smoothness and coverage constraints,
> rho, branch lengths, background frequencies, phylogenetic tree,
> substitution matrix)?
> Are the background frequencies, rho scaling factor, branch lengths and
> substitution matrix estimated as described in Siepel's paper
> ('Evolutionary conserved elements in vertebrate, insect, worm and yeast
> genomes', Genome Research 2005)?
> How do you derive the smoothness and coverage constraints?
> If, for example, you would add an 18th species to the 17-way
> conservation, would you re-tune all the settings?
> 
> - As I understand it, the 'most conserved'-track displays the viterbi
> path conserved states. This would usually correspond to a high
> (posterior) conservation score. I've attached an image displaying some
> results we got with the genome browser. Between the 2nd and 3rd
> conserved elements there seems to be a high conservation score (top
> graph) but no conserved element. When would a case like this arise?
> Also: what do the little green graphs represent, are those the pair-wise
> phastcon scores (reference vs species)?
> The top (blue) graph is a representation of the posterior probabilities
> (=conservation score => P(state(i) = j) | data X, model parameters),
> where state(i) = j represents the fact that the state in the path at
> position i = j), right?
> 
> Thanks a lot for your time,
> Jorma de Ronde
> 
> p.s. The image comes from the following URL:
> http://genome.ucsc.edu/cgi-bin/hgTracks?org=Human&db=hg18&hgsid=95943147
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome



More information about the Genome mailing list