[Genome] hg17 to hg18 liftOver and unmapped regions

Arkady bamboowarrior at gmail.com
Fri May 23 18:48:49 PDT 2008


Hi folks.

I'm looking at microarray data, specifically from the Affymetrix whole
genome tile array (Kapranov et all 2007) from the May 2004 build. (It's got
a track on the genome browser as well, Affy Phase 3 or somesuch.)

I took Affy's original transcribed fragments list for HDFs (consisting of
ranges of genome locations in single-chromosome files) and converted them to
the position format. Then I ran them as one file through liftOver from hg17
to hg18, and mostly it seems to have worked. However, I also got around 20
lines of unmapped locations. The number of unmapped locations is reduced if
I lower -minMatch significantly, but there are a couple that don't seem to
go away.

The first one:
#Deleted in new
chr1:103788371-103788472

If I get the DNA for that fragment (an exon of RNPC3) off of May 2004 and
search for it with BLAT in March 2006, it shows up with 96.9% identity
(span=94, qsize=102, high score) at:
chr3:70,258,127-70,258,220
That aligns with several other species' RNPC3. Why is this not mapped in the
chain file?

Here are the rest (with -minMatch 0.4)
#Deleted in new
chr1:103789464-103789522 # doesn't map well
#Deleted in new
chr1:103789976-103790082 # does map well, also to chr3
#Deleted in new
chr1:103790783-103790844 # does map well, to chr3
#Deleted in new
chr1:103791990-103792116
#Deleted in new
chr1:103796005-103796100
#Deleted in new
chr1:103797938-103797998
#Deleted in new
chrX:114795577-114795630 # maps equally well to 7 locations on chrX

I didn't check the three above that have no comments.

I suspect I'm misunderstanding how the chains are created. I suppose I could
just write a script to BLAT the "deleted" transcribed fragments and get the
correct mappings, but I feel like this is a little bit of a hack.

Thanks for the help.

Cheers,
John Woods

Center for Systems and Synthetic Biology
Institute for Cellular and Molecular Biology
The University of Texas at Austin


More information about the Genome mailing list