[Genome] Lift Genome Annotations: mm3 2 mm6 (fwd)
Rachel Harte
hartera at soe.ucsc.edu
Thu Mar 15 16:32:53 PDT 2007
Enrique,
Please note that, in my earlier e-mail, I should have said that
"Larger regions may NOT always be lifted to mm6 due to changes in the
assembly." (at the end of the fourth paragraph).
Also, one of our engineers said that she has had good success with using
single base positions as input to the LiftOver tool.
Sorry for the errors.
Rachel
Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
---------- Forwarded message ----------
Date: Thu, 15 Mar 2007 15:56:13 -0700 (PDT)
From: Rachel Harte <hartera at soe.ucsc.edu>
To: emuro at ohri.ca
Cc: genome at soe.ucsc.edu
Subject: Re: [Genome] Lift Genome Annotations: mm3 2 mm6
Enrique,
It could be that there is a big change between the mm3 and the mm6 assemblies
and a lot of regions have been rearranged or gaps have been filled in so
it is not possible to lift all your regions from mm3 to mm6. There are
over 60,000 chains for the mm3 to mm6 lift. Some chromosomes are better
than others for having large chain alignments between assemblies. chr1,
for instance, does not have any really large regions that align between
the two assemblies. Another thing to remember is that the mouse chromosomes are
very acrocentric (unbalanced with respect to the centromere position). The
centromere region is difficult to sequence due to repetitive regions and
so it often appears as a gap in assemblies. In mm6, the first 3 megabases
is a gap on chr3 and on mm8, this is labelled as the centromere. So for this
region there will be no liftOver chains.
One thing you could try is changing the parameters on the liftOver tool to be
less stringent. So you could try lowering the minimum ratio of bases that must map. Checking
the "Allow multiple output regions" box will allow a region from mm3 to be
mapped to multiple regions on mm6 so that may help too.
For the region from mm3 that you specified below, the start position is
equal to the end position. BED format requires 0-based starts so if base 1
of a chromosome is 0 so you need to subtract 1 from the start e.g.
chr4 2146570 2146571
If you use start=end for the BED format, then LiftOver will always give
the error, "Deleted in New".
Are all your regions perhaps SNPs of 1 base in size? The liftOver tool may
not work so well with these. In this case, you should try using regions
that flank the 1 base positions. You say that you tried 10k regions but
you could also try smaller regions such as 5k or 1k and see if that would
work. Larger regions may always be lifted to mm6 due to changes in the
assembly.
I took a look at the example region below on the archived mouse mm3 Genome
Browser:
http://genome-mm3.cse.ucsc.edu/cgi-bin/hgGateway
This region is in a gap so the sequence is an N so it will not
be possible to lift this region over to another assembly. Lifts are
calculated based on sequence alignments between the assemblies.
You could also check to see how many of your regions occur in gaps. To do
this, you can go to the above archive URL for mm3 and then click on the
"Add Your Own Custom Tracks" button. Then if you go to the Table Browser by clicking on the
"Tables" link on the top blue bar, you can intersect this track with the
gap track and see how many regions fall in gaps.
One other thing you could try is changing the parameters on the liftOver
tool to be less stringent. So you could try lowering the minimum ratio of
bases that must map. Checking the "Allow multiple output regions" box will
allow a region from mm3 to bemapped to multiple regions on mm6 so that may help too.
Finally, using the BLAT tool to align the flanking regions of single base
positions from mm3 to the mm6 assembly is another approach that you could
try.
I hope that this helps. Please let us know if you have further questions.
Rachel
Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> > ------------------------------------------------------------------------
> >
> > Subject:
> > Lift Genome Annotations: mm3 2 mm6
> > From:
> > "Muro, Enrique" <emuro at ohri.ca>
> > Date:
> > Thu, 15 Mar 2007 13:50:10 -0400
> > To:
> > <genome at soe.ucsc.edu>
> >
> > To:
> > <genome at soe.ucsc.edu>
> >
> >
> >
> > Hi,
> >
> > I am trying to convert 16,224 positions from mouse mm3 to mouse mm6
> > using http://genome.ucsc.edu/cgi-bin/hgLiftOver
> > I a file with 16,224 lines in BED format for the positions
> > example of a line,
> > chr4 2146571 2146571
> >
> > the web server is telling me that "Conversion failed on 16224 records"
> > providing the next explain failure message
> >
> > "Deleted in new:
> > None of sequence intersects with any alignment chain for the region
> > Partially deleted in new:
> > Sequence intersects with part of a single alignment chain in the region
> > Split in new
> > Sequence partially intersects multiple chains in the region
> > Duplicated in new
> > Sequence completely intersects multiple chains in the region
> > Boundary problem
> > Missing start or end base in an exon"
> >
> >
> > I tried with ranges of 10K nt just in case it fails because I am
> > interesting in
> > a genomic position
> >
> > Do you know what can be happening?.
> > Thanks in advance,
> >
> > Enrique
> >
> > ------------------------------------------------------------------------
> >
> > Confidentiality Statement - The contents of this e-mail, including its
> > attachment, are intended for the exclusive use of the recipient and may
> > contain confidential or privileged information. If you are not the
> > intended recipient, you are strictly prohibited from reading, using,
> > disclosing, copying, or distributing this e-mail or any of its
> > contents. If you received this e-mail in error, please notify the
> > sender by reply e-mail immediately or the Privacy Office
> > (privacy at ottawahospital.on.ca <mailto:privacy at ottawahospital.on.ca> )
> > and permanently delete this e-mail and its attachments, along with any
> > copies thereof. Thank you.
> >
> >
> >
> > Avis de confidentialité Ce courriel, y compris ses pièces jointes,
> > sadresse au destinataire uniquement et pourrait contenir des
> > renseignements confidentiels. Si vous nêtes pas le bon destinataire, il
> > est strictement interdit de lire, dutiliser, de divulguer, de copier ou
> > de diffuser ce courriel ou son contenu, en partie ou en entier. Si vous
> > avez reçu ce courriel par erreur, veuillez en informer immédiatement
> > lexpéditeur ou le bureau de la Protection des renseignements personnels
> > (info.privee at hopitalottawa.on.ca
> > <mailto:info.privee at hopitalottawa.on.ca>), puis effacez le courriel
> > ainsi que les pièces jointes et toute autre copie. Merci.
> >
> > ------------------------------------------------------------------------
> >
>
-------------- next part --------------
_______________________________________________
Genome maillist - Genome at soe.ucsc.edu
http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list