[Genome] Unplaced portions of the Human genome release
Rachel Harte
hartera at soe.ucsc.edu
Mon Jan 15 17:38:40 PST 2007
Dear Zayed,
Yes, the chr*_random.gz files contain the unplaced sequences. However,
the contigs are put together in an unordered sequence with large gaps on
Ns to form these artificial chromosomes. You can find the contig
sequences and how they relate to the random chromosomes if you go to the
Full Data Set link for an assembly - this is the bigZips directory
e.g. for hg18, it is:
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/
The contigAgp.zip file contains agp files for each of the chromosomes and
in these you can see how the contigs were put together to form the random
chromsomes. The agp file format is specified here:
http://www.ncbi.nlm.nih.gov/genome/guide/Assembly/AGP_Specification.html
The contigFa.zip and contigFaMasked.zip files contain the individual contig
sequences that make up the chromosomes in the genome assembly.
I hope that this helps you. Please let us know if you have further
questions.
Rachel
Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
On Tue, 16 Jan 2007, zayed albertyn wrote:
> Dear Genome Mailing List,
>
> I am trying to find a comprehensive collection of all human genome
> sequences that are not in the current assembly. These may be unplaced
> contigs or unfinished sequences that show no significant match to the
> currently published sequence. Perhaps I can obtain these from
> genome.ucsc.edu or the NCBI ftp site.
> Would the chr*_random.fa.gz on the GoldenPath download site be what I am
> looking for?
>
> Thanks for the help.
>
> Zayed
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list