[Genome] N_blocks of the same length
Angie Hinrichs
angie at soe.ucsc.edu
Fri Jan 25 11:25:38 PST 2008
Hi Petr,
All of those are assembly gaps of unknown size, mostly unbridged. You
can see the details of the assembly in AGP files available for
download. Assuming you are using the latest human assembly, hg18:
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/chromAgp.zip
The file chr1.agp shows the construction of chr1:
chr1 1 616 1 F AP006221.1 36116 36731 -
chr1 617 167280 2 F AL627309.15 241 166904 +
chr1 167281 217280 3 N 50000 clone yes
chr1 217281 257582 4 F AP006222.1 1 40302 +
chr1 257583 307582 5 N 50000 contig no
chr1 307583 461231 6 F AL732372.15 1 153649 +
chr1 461232 511231 7 N 50000 contig no
chr1 511232 622780 8 F AC114498.2 1 111549 +
...
If you extract the lines whose 5th column is N, you will see all of
the 42 gaps. hg18 chr1 has mostly 50000, one 60000, and very large
centromere and heterochromatin gaps. More info on the AGP format is
available here:
http://www.ncbi.nlm.nih.gov/genome/guide/Assembly/AGP_Specification.html
Hope that helps,
Angie
On Thu, 24 Jan 2008, Pancoska Petr wrote:
> For computational reasons, I need to split chromosome sequence(s)
> into rationally selected blocks (max 150KB per file) while keeping
> the position information correct. Going through the sequence of
> Chr:1 downloaded from the ftp site (chr1.fa.gz), in the intermediate
> stage (locating the N- blocks and listing them for review as these
> boundaries are rational places to split the chromosome) found
> strange in output of that these "missing" parts are systematically
> (and exactly) segments of N's each 50KB long. I thought that I have
> some error in the routine, so I wnet to few of these by hand in the
> original file and those few I checked were indeed 50KB each. Is this
> something I overlooked in the info arround or there is a reason for
> that?
>
> Thanks for help, Petr Pancoska
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list