[Genome] RepeatMasker track on Lizard genome (Feb 2007)
Angie Hinrichs
angie at soe.ucsc.edu
Fri Jan 25 11:05:57 PST 2008
Hi Vladimir,
repeatmasker.org is running the latest version (open-3-1-9), with the
latest library from RepBase Update (20071204). The lizard
RepeatMasker track was generated back in early 2007 with the latest
versions at the time: open-3-1-6 and 20061006. So it is possible that
recent enhancements in the library and/or RepeatMasker program could
explain the difference.
Also, RepeatMasker results can vary based on the sequence boundaries
because it factors in things like GC%. We chop up the sequence into
500,000-base chunks and then run RepeatMasker on each chunk in a
compute cluster. Your range falls into the middle of such a chunk, so
the LINE is not split across chunk boundaries; just not recognized by
the late-2006 version.
Arian Smit and Robert Hubley (of repeatmasker.org / ISB) are the
creators of RepeatMasker, so they could give a more authoritative
answer.
Angie
On Fri, 25 Jan 2008, Vladimir Kuryshev wrote:
> Dear UCSC gurus,
>
> Would you check pls why RepeatMasker track doesn't show relevant
> information on the lizard genome?
>
> E.g., take a look at a region:
> scaffold_68:1,284,500-1,285,440
>
> contains clear part of LINE:
> sequences: 1
> total length: 1941 bp (1941 bp excl N/X-runs)
> GC level: 50.90 %
> bases masked: 1061 bp ( 54.66 %)
> ==================================================
> number of length percentage
> elements* occupied of sequence
> --------------------------------------------------
> Retroelements 1 1094 bp 56.36 %
> SINEs: 0 0 bp 0.00 %
> Penelope 0 0 bp 0.00 %
> LINEs: 1 1094 bp 56.36 %
> CRE/SLACS 0 0 bp 0.00 %
> L2/CR1/Rex 1 1094 bp 56.36 %
> ..
> This is a Repeatmasker output (http://www.repeatmasker.org/).
>
> I would appreciate your feedback with some explanations.
>
> wbw,
> Vladimir
>
More information about the Genome
mailing list