[Genome] differences between conservation score that generated from table browser and downloaded from ftp site

Kayla Smith kayla at soe.ucsc.edu
Wed May 23 17:50:38 PDT 2007


Dear Zhuo Fang,

I downloaded 
http://hgdownload.cse.ucsc.edu/goldenPath/dm2/database/chr4.pp.gz from 
our website.  The line numbers and values that you have pasted below do 
not agree with what I see in the file.

Here is what you have:

 > 1834	0.677
 > 1835	0.493
 > 1836	0.899
 > 1837	0.909
 > 1838	0.903
 > 1839	0.916
 > 1840	0.953

However, I found those values to be on line numbers 873230-873236 of the 
chr4.pp file.  As you say, those numbers do not agree with the Table 
Browser results. This is unsurprising, since this is the wrong place in 
the file.

I'm not able to figure out why you were looking at the position 
873230-873236 in chr4.pp, however, one thing to note is that at the 
start of chr4.pp there is a note which says "fixedStep chrom=chr4 
start=1700 step=1"  This is how you know that the file starts at genomic 
coordinate 1700. So there may be some offset issues.

The lines numbers of chr4.pp which correspond to the Table Browser 
results that you list (using chr4:1833-1839 in the Table Browser), I 
found by opening the file and scrolling down to the 141st line.  Here 
are the data I found at that location (lines 135-141 of chr4.pp):

0.015
0.014
0.002
0.001
0.006
0.006
0.008

This agrees with the Table Browser results (minus the loss of resolution 
due to compression).  The reason I scrolled down to the 141st line of 
the file is because I subtracted 1700 (the offset) from your end 
coordinate 1840.  I added 1 to this because the first line of chr4.pp 
isn't data.

I hope this is helpful to you.  Please don't hesitate to contact the 
genome mailing list again if you require further assistance.

Kayla Smith
UCSC Genome Bioinformatics Group


Zhuo Fang wrote:
> Dear Sir/Madam,
> 
> I encountered a problem when analyzing with the phastCons conservation
> score. There are two ways to get the phastCons scores of a certain region.
> One is generating from the table browser and the other is downloading the
> phastCons score file of the whole genome from the ftp site and extract the
> corresponding scores. I am listing the example results of these two
> operations.
> 
> For the table browser, I selected:
> assembly: D. melanogaster--Apr. 2004
> track: Conservation
> table: phastCons15ways
> position: chr4:1833-1939
> Then the results are:
> 
> 1833	0.0149921
> 1834	0.00749606
> 1835	0
> 1836	0
> 1837	0
> 1838	0
> 1839	0.00749606
> 
> For the genome-wide file, I downloaded from
> http://hgdownload.cse.ucsc.edu/goldenPath/dm2/phastCons15way/
> Then I extracted the conservation scores of region chr4:1833-1939 (since
> the ftp file is one-based, I actually extracted chr4:1834-1940 from file
> chr4.pp) and the results are:
> 
> 1834	0.677
> 1835	0.493
> 1836	0.899
> 1837	0.909
> 1838	0.903
> 1839	0.916
> 1840	0.953
> 
> As you see, the scores are totally different and moreover, the scores in
> the first result represent not conserved and the second one represents
> conserved. I found this is not a specific case.
> 
> So could you please let me know why this happen and how can I deal with
> the problem?
> 
> Thanks and best regards,
> 
> Zhuo Fang
> 
> -------
> Zhuo Fang, Ph.D.
> AG Nikolaus Rajewsky
> Division of Systems Biology
> Max Delbruck Center for Molecular Medicine (MDC)
> Robert-Rössle-Str. 10 D-13125 Berlin, Germany
> TEL: +49-30-9406-2990
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
> From - Tue



More information about the Genome mailing list