[Genome] Generating Vertebrate Multiz Alignment Data

Katherine Gurdziel gurdziel at wi.mit.edu
Tue Jan 22 06:46:50 PST 2008


Hello and thank you for taking the time to answer my question.

I am trying to generate cross-species coordinates between humans and four
other species that are consistent with the Vertebrate Multiz Alignment &
PhastCons Conservation data, and I think I might be overlooking something.
After reading your documentation and email files, I thought the best method
to use would be liftOver, supplementing the values that liftOver does not
return with data extracted from the UCSC maf files (downloaded from
goldenPath/hg18/multiz28way/).  My understanding of the data pipeline is
that both liftOver chain files and maf files are derived from axt pairwise
data.

The liftOver program with minMatch=.01 returns coordinates in the majority
of instances.  As expected, there are cases where liftOver fails because the
region maps to several sections.  When liftOver fails, I extract the
relevant data from the maf files.  For multiple hits, we had planned to use
the maf extracted coordinates spanning the longest match.  In the following
example, the best match seems to be on mm8.chr17.   However, when I compare
the maf data to the browser's Vertebrate Multiz Alignment & PhastCons
Conservation page, the data that is returned only represents the chr6
coordinates.  

Human hg18 coordinates: chr9 4518 4806
Mouse data extracted from chr9.maf file (maf version=1 scoring=autoMZ.v1):

s mm8.chr6 28035902 14 - 149525685 GGGACTCAGTTCTT
s mm8.chr17                29158511 143 -  95177420
TATTCCGCCGTGTTGCTGTTTTCTCTCAGGATCCTTTGAAGGAAGAAGCTCTTCAGCAAAG---CC-----
AGCA-
--GATGGCTCATACAG-------------------------------------------TCCCGACTCCATGAAGA
GGAATTAGGC------------------------ACAGGCCTGGGACACC--TGCCCAGCCCCCT-GCACCT
s mm8.chr17                29158688 47 -  95177420
CCGAGGAGCCCCTGCCAGCTG-CT-CTGCC------ATGTG----------GGAG-AATGGCCAAA----------
-------------------
s mm8.chr17 29158735 7 - 95177420 --GCAC----TCT


Data on UCSC browser Vertebrate Multiz Alignemnt & PhastCons Conservation
(28 Species) page redirected from UCSC Genome Browser on Human Mar. 2006
Assembly position/search chr9:4518-4806

chr6:121,489,770-121,489,784
chr6:121,488,603-121,489,769    Unaligned

Could you please explain how the genome browser Vertebrate Multiz 
Alignment & PhastCons Conservation data is generated?  I would also really
appreciate some suggestions on how to extract cross-species coordinates that
align as closely as possible to the information contained on your browser.

Sincerely,

Katherine Gurdziel
Bioinformatics Analyst
Bioinformatics and Research Computing
Whitehead Institute Room 215
Nine Cambridge Center
Cambridge, MA 02142




More information about the Genome mailing list