[Genome] What is difference?
Hiram Clawson
hiram at soe.ucsc.edu
Wed Nov 1 16:42:20 PST 2006
Good Afternoon Hairong:
Please note, you were looking at the NCBI fasta file that contains the contigs
for chr1. To see the assembled chr1, use this file:
ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/hs_ref_chr1.fa.gz
It is identical to the UCSC chr1 sequence. The UCSC sequence includes masking
from Repeat masker and simple repeats:
$ faSize chr1.fa.gz
247249719 bases (22250000 N's 224999719 real 115286666 upper 109713053 lower)
$ faSize hs_ref_chr1.fa.gz
247249719 bases (22250000 N's 224999719 real 224999719 upper 0 lower)
The line length in the files is different. You need to compare the actual sequence,
not the text fasta files.
--Hiram
Hairong Wei wrote:
> To Whom it may concern:
>
> I downloaded human chr1 assembly sequence from your ftp (
> ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes ) and NCBI's web
> site ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_01, and then looked at
> the file sizex, they are different.
>
> wireless-162:~/wicell/human_genome_assembly hwei$ wc chr1.fa
> 4944996 4944996 252194720 chr1.fa
> wireless-162:~/wicell/human_genome_assembly hwei$ wc hs_ref_chr1.fa
> 3532140 3532148 250781962 hs_ref_chr1.fa
> wireless-162:~/wicell/human_genome_assembly hwei$
>
> Why they are different? Are assembly chromosome sequences provided at your
> website different from NCBI's?
>
> Hairong Wei
> Wicell Research Institute
> 608-890-1533
More information about the Genome
mailing list