[Genome] Fwd: Repeat consensus length inconsistency?

Angie Hinrichs angie at soe.ucsc.edu
Wed Aug 15 09:29:31 PDT 2007


---------- Forwarded message ----------
Date: Wed, 15 Aug 2007 02:24:47 -0700
From: Arian Smit <asmit at systemsbiology.org>
To: mwmessin at ucsd.edu
Subject: FW: [Fwd: Fwd: Repeat consensus length inconsistency?]

Hi Michael,

The database has consensus sequences for (the 3'ends of) two L2 subfamilies
(L2a and L2b), which differ by sequence and length. RepeatMasker always
ambiguates the output, naming them both L2, for no good reason really. We'll
fix that.
Thanks for the heads up,

Arian
-- 
Arian F.A. Smit PhD
Institute for Systems Biology
1441 North 34th Street
Seattle, WA 98103-8904
(206) 732-1271


-------- Original Message --------
Subject: [Genome] Repeat consensus length inconsistency?
Date: Tue, 14 Aug 2007 02:13:36 -0700 (PDT)
From: mwmessin at ucsd.edu
To: genome at soe.ucsc.edu

Hello,

I have a question regarding the consensus length of repeat instances as
found on the RepeatMasker data-set. (the one I am using is for the hg18
assembly) I noticed that the consensus length of repeat instances, as
generated by "repEnd" - "repLeft" for '+' strands and "repEnd" -
"repStart" for '-' strands, differs within the same repeat family and
subfamily. Most notable are LINEs, such as L2, for example, which has
127241 instances with a consensus of size 3378bp and 276861 instances with
a consensus of size 3419bp.

I understand that there is some sort of error introduced by insertions and
deletions and whatnot, but in this example of L2 there appear to be two
very differentiable populations of L2s, based on their consensus length.
This makes generating consensus base-axis plots less meaningful as certain
"double-vision" type artifacts become introduced with the differing
consensus alignments.

Essentially my question is: why isn't there one consensus? and also how
did these different populations of consensus length get put into the same
family/subfamily?

Cheers,

Michael Messing
Computational and Mathematical Biology
Genome Institute of Singapore
60 Biopolis Street, Genome, #02-01
Singapore 138672

------ End of Forwarded Message


More information about the Genome mailing list