[Genome] What is difference?

Hiram Clawson hiram at soe.ucsc.edu
Wed Nov 1 16:42:20 PST 2006


Good Afternoon Hairong:

Please note, you were looking at the NCBI fasta file that contains the contigs
for chr1.  To see the assembled chr1, use this file:
ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/hs_ref_chr1.fa.gz

It is identical to the UCSC chr1 sequence.  The UCSC sequence includes masking
from Repeat masker and simple repeats:

$ faSize chr1.fa.gz
247249719 bases (22250000 N's 224999719 real 115286666 upper 109713053 lower)
$ faSize hs_ref_chr1.fa.gz
247249719 bases (22250000 N's 224999719 real 224999719 upper 0 lower)

The line length in the files is different.  You need to compare the actual sequence,
not the text fasta files.

--Hiram

Hairong Wei wrote:
> To Whom it may concern:
> 
> I downloaded human chr1 assembly sequence from your ftp (
> ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes ) and NCBI's web
> site ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_01, and then looked  at
> the file sizex, they are different.
> 
> wireless-162:~/wicell/human_genome_assembly hwei$ wc chr1.fa
>  4944996 4944996 252194720 chr1.fa
> wireless-162:~/wicell/human_genome_assembly hwei$ wc hs_ref_chr1.fa
>  3532140 3532148 250781962 hs_ref_chr1.fa
> wireless-162:~/wicell/human_genome_assembly hwei$
> 
> Why they are different? Are assembly chromosome sequences provided at your
> website different from NCBI's?
> 
> Hairong Wei
> Wicell Research Institute
> 608-890-1533


More information about the Genome mailing list