[Genome] hg17 phastcons
Yael Altuvia
yaelal at md.huji.ac.il
Sun Jun 10 23:28:31 PDT 2007
HI,
1. We need to download the pahstCons17way values for the whole human genome.
We downloaded the phastCons17way data and we found that there are
differences between the
conservation score we get from the downloaded file compared to the values
directly extracted by the
coordinates from the browser table phastCons17way.
e.g:
values extractd from the browser table:
track type=wiggle_0 name="Conservation" description="Vertebrate Multiz
Alignment & Conservation"
# output date: 2007-06-11 05:40:18 UTC
# chrom specified: chr22
# position specified: 14440502-14440532
# (Worst case: 0) The original source data
# (before querying and compression) is available at
# http://hgdownload.cse.ucsc.edu/downloads.html
variableStep chrom=chr22 span=1
14440502 0
14440503 0
14440504 0
14440505 0
14440506 0
14440507 0
14440508 0
14440509 0
14440510 0
14440511 0
14440512 0
14440513 0
14440514 0
14440515 0
14440516 0
14440517 0
14440518 0
14440519 0
14440520 0
14440521 0
14440522 0
14440523 0
14440524 0
14440525 0
14440526 0
14440527 0
14440528 0
14440529 0
14440530 0
14440531 0
14440532 0
Values extracted from the downloaded chr22.gz (downloaded from
http://hgdownload.cse.ucsc.edu/goldenPath/hg17/phastCons17way/)
assuming that there is no missing values we added the coordinate starting
from
14430001 according to the first line with the description:
fixedStep chrom=chr22 start=14430001 step=1
14440502 0.551
14440503 0.628
14440504 0.659
14440505 0.671
14440506 0.678
14440507 0.679
14440508 0.676
14440509 0.681
14440510 0.680
14440511 0.676
14440512 0.669
14440513 0.655
14440514 0.621
14440515 0.581
14440516 0.535
14440517 0.480
14440518 0.436
14440519 0.325
14440520 0.150
14440521 0.126
14440522 0.116
14440523 0.090
14440524 0.080
Also by extracting all the chr22 values from the browser you get:
# chrom specified: chr22
# position specified: 1-49554710
while the downloaded chr22.gz file has only
33939176 lines
Which are the values which we should use and where should we download them
from?
2. What is the policity regarding regions with missing data? are they
assinged 0?
Thanks
yael
More information about the Genome
mailing list