[Genome] Does MAF format for multiz17way with ref hg18 not have "i" or "e" lines?
Ann Zweig
ann at soe.ucsc.edu
Wed Feb 21 12:05:34 PST 2007
Hello Andrew,
There are two sets of MAF files: one set is the standard MAF format,
and the other set is annotated with gaps and frames (these have the "i"
rows you are looking for). The standard MAF set is the only set
available for download from our download server. We use the annotated
MAF set for display in the Conservation track. It is too large to place
on our download server and is for internal use only. However, the Table
Browser tool on our website has access to these annotated files (and
associated table), so you are welcome to use the Table Browser to get
the information you need.
Follow these steps to search the annotated MAF files and tables using
the Table Browser:
1. Navigate to the Table Browser.
Press the "Tables" link from the navigation bar in the assembly.
2. Configure Table Browser like so:
Choose your clade, genome and assembly.
group = Comparative Genomics
track = Conservation
table = multiz17way
region = paste your region of interest in the "position" box
output format = MAF
output file = name of file to save
3. Press "get output".
For this example, I will use the very small (5 bp) region of
chrX:151196518-151196522 (in the hg18 assembly). In this case, the
output looks like this (complete with "i" rows):
##maf version=1
a score=-3709.000000
s hg18.chrX 151196517 5 + 154913754 TG---TTA
s rheMac2.chrX 150362366 5 + 153947521 TG---TTA
i rheMac2.chrX C 0 C 0
s rn4.chrX 158483014 8 + 160699376 GATCCCCT
i rn4.chrX I 7076 C 0
s mm8.chrX 68761810 8 + 165556469 TATCCCCT
i mm8.chrX C 0 C 0
s oryCun1.scaffold_188228 4873 8 - 8342 AGTTCTAA
i oryCun1.scaffold_188228 C 0 C 0
s panTro1.chrX 156403756 3 + 160174553 -----TTA
i panTro1.chrX I 120 C 0
s bosTau2.scaffold5997 4279 8 - 49150 GGTCCTCA
i bosTau2.scaffold5997 C 0 C 0
s canFam2.chrX 123312605 8 + 126883977 AATTCATA
i canFam2.chrX C 0 N 0
s echTel1.scaffold_237330 23987 8 + 24714 GGTCTTCA
i echTel1.scaffold_237330 C 0 C 0
s loxAfr1.scaffold_92555 428 8 - 6223 GGTCTTCA
i loxAfr1.scaffold_92555 C 0 C 0
e danRer3.chrNA 102247170 0 + 253521007 I
e monDom4.chrX 34125699 23099 + 60718501 I
If you have several regions of interest, you can make a custom track
with all of your regions. Then intersect your custom track with the
multiz17way table to get the MAF output for all regions at once.
To create a custom track:
http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks
To intersect your custom track with the mutliz17way table:
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection
This should be enough to get you started. Be sure to let us know if
you run into any trouble or have more questions.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Andrew Smith wrote:
> I've been using the multiz17way alignments, referenced on different species and
> assemblies. Today I thought I needed to use the "i" lines because I was
> manipulating the blocks in a way that would require looking to the adjacent
> blocks and merging them. In particular, I wanted to know if the sequence before
> or after a particular block is contiguous with the one in the block I'm
> examining, and your online help for MAF format says this is contained in the
> information line (starting with "i"). But I can't find these lines in hg18, hg17
> or mm8. I didn't look at every line of every file, but checked several
> chromosomes. When the sequences are contiguous, it is usually easy to spot
> because the end of one block is the start of the next (depending on strand). But
> this is not always the case, e.g., something may be either deleted, or aligned
> to another chromosome.
>
> So are there still "i" lines in the MAF files? And if not, do you have any
> suggestions?
>
> If from my question it seems like I don't understand these files, point me to
> more info please.
>
> Thanks,
> Andrew
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list