[Genome] differences between conservation score that generated from table browser and downloaded from ftp site
Kayla Smith
kayla at soe.ucsc.edu
Wed May 23 17:50:38 PDT 2007
Dear Zhuo Fang,
I downloaded
http://hgdownload.cse.ucsc.edu/goldenPath/dm2/database/chr4.pp.gz from
our website. The line numbers and values that you have pasted below do
not agree with what I see in the file.
Here is what you have:
> 1834 0.677
> 1835 0.493
> 1836 0.899
> 1837 0.909
> 1838 0.903
> 1839 0.916
> 1840 0.953
However, I found those values to be on line numbers 873230-873236 of the
chr4.pp file. As you say, those numbers do not agree with the Table
Browser results. This is unsurprising, since this is the wrong place in
the file.
I'm not able to figure out why you were looking at the position
873230-873236 in chr4.pp, however, one thing to note is that at the
start of chr4.pp there is a note which says "fixedStep chrom=chr4
start=1700 step=1" This is how you know that the file starts at genomic
coordinate 1700. So there may be some offset issues.
The lines numbers of chr4.pp which correspond to the Table Browser
results that you list (using chr4:1833-1839 in the Table Browser), I
found by opening the file and scrolling down to the 141st line. Here
are the data I found at that location (lines 135-141 of chr4.pp):
0.015
0.014
0.002
0.001
0.006
0.006
0.008
This agrees with the Table Browser results (minus the loss of resolution
due to compression). The reason I scrolled down to the 141st line of
the file is because I subtracted 1700 (the offset) from your end
coordinate 1840. I added 1 to this because the first line of chr4.pp
isn't data.
I hope this is helpful to you. Please don't hesitate to contact the
genome mailing list again if you require further assistance.
Kayla Smith
UCSC Genome Bioinformatics Group
Zhuo Fang wrote:
> Dear Sir/Madam,
>
> I encountered a problem when analyzing with the phastCons conservation
> score. There are two ways to get the phastCons scores of a certain region.
> One is generating from the table browser and the other is downloading the
> phastCons score file of the whole genome from the ftp site and extract the
> corresponding scores. I am listing the example results of these two
> operations.
>
> For the table browser, I selected:
> assembly: D. melanogaster--Apr. 2004
> track: Conservation
> table: phastCons15ways
> position: chr4:1833-1939
> Then the results are:
>
> 1833 0.0149921
> 1834 0.00749606
> 1835 0
> 1836 0
> 1837 0
> 1838 0
> 1839 0.00749606
>
> For the genome-wide file, I downloaded from
> http://hgdownload.cse.ucsc.edu/goldenPath/dm2/phastCons15way/
> Then I extracted the conservation scores of region chr4:1833-1939 (since
> the ftp file is one-based, I actually extracted chr4:1834-1940 from file
> chr4.pp) and the results are:
>
> 1834 0.677
> 1835 0.493
> 1836 0.899
> 1837 0.909
> 1838 0.903
> 1839 0.916
> 1840 0.953
>
> As you see, the scores are totally different and moreover, the scores in
> the first result represent not conserved and the second one represents
> conserved. I found this is not a specific case.
>
> So could you please let me know why this happen and how can I deal with
> the problem?
>
> Thanks and best regards,
>
> Zhuo Fang
>
> -------
> Zhuo Fang, Ph.D.
> AG Nikolaus Rajewsky
> Division of Systems Biology
> Max Delbruck Center for Molecular Medicine (MDC)
> Robert-Rössle-Str. 10 D-13125 Berlin, Germany
> TEL: +49-30-9406-2990
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
> From - Tue
More information about the Genome
mailing list