[Genome] Lowercase in .net.axt vs .fa files
Hiram Clawson
hiram at soe.ucsc.edu
Thu Aug 16 09:19:52 PDT 2007
Good Morning Martin:
The lower case masking you find in the hg18 genome is from RepeatMasker (plus simple repeats).
The lower case masking you find in the net.axt files are from WindowMasker (plus simple repeats).
The alignments between fr2 and hg18 used the NCBI WindowMasker program to
mask out repeats. You see the description of the WindowMasker operation
on our development browser:
http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=windowmaskerSdust
WindowMasker was used for both organisms because RepeatMasker only masked out %7
of the Fugu genome, whereas the WindowMasker masked out %29.
For the Human genome, WindowMasker masks out approximately the same amount
of genome (%45) as RepeatMasker does (%48) but WindowMasker does not mark exactly
the same sequence.
If you need the WindowMasker (plus simple repeats) masked sequence in the hg18 assembly,
we can make that available.
--Hiram
Martin Frith wrote:
> Hello:
>
> the .net.axt files include lowercase sequence, but this does not seem
> to correspond to annotated repeats, or to lowercase sequence in the
> original chromosome files.
>
> For example:
>
> http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsFr2/axtNet/chr10.hg18.fr2.net.axt.gz
>
> includes:
>
> 0 chr10 82862 84100 chrUn 370283108 370284351 + 70025
> GCATactataaaaatgctttaaaac--gCAGCA...
> gtacgtTATATAAGAAGTTTAATATTGACAAGA...
>
> but the corresponding region is uppercase in:
> http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/chr10.fa.gz
>
> and this region of hg18 has no annotated repeats in the genome browser.
>
> So where does the lowercase in the .net.axt files come from?
>
> Have a nice day,
> Martin Frith
> http://www.cbrc.jp/~martin/
More information about the Genome
mailing list