[Genome] Does MAF format for multiz17way with ref hg18 not have "i" or "e" lines?

Ann Zweig ann at soe.ucsc.edu
Wed Feb 21 12:05:34 PST 2007


Hello Andrew,

	There are two sets of MAF files: one set is the standard MAF format, 
and the other set is annotated with gaps and frames (these have the "i" 
rows you are looking for).  The standard MAF set is the only set 
available for download from our download server.  We use the annotated 
MAF set for display in the Conservation track. It is too large to place 
on our download server and is for internal use only.  However, the Table 
Browser tool on our website has access to these annotated files (and 
associated table), so you are welcome to use the Table Browser to get 
the information you need.

	Follow these steps to search the annotated MAF files and tables using 
the Table Browser:

1. Navigate to the Table Browser.
   Press the "Tables" link from the navigation bar in the assembly.

2. Configure Table Browser like so:
   Choose your clade, genome and assembly.
   group = Comparative Genomics
   track = Conservation
   table = multiz17way
   region = paste your region of interest in the "position" box
   output format = MAF
   output file = name of file to save

3. Press "get output".


For this example, I will use the very small (5 bp) region of 
chrX:151196518-151196522 (in the hg18 assembly).  In this case, the 
output looks like this (complete with "i" rows):

##maf version=1
a score=-3709.000000
s hg18.chrX               151196517 5 + 154913754 TG---TTA
s rheMac2.chrX            150362366 5 + 153947521 TG---TTA
i rheMac2.chrX            C 0 C 0
s rn4.chrX                158483014 8 + 160699376 GATCCCCT
i rn4.chrX                I 7076 C 0
s mm8.chrX                 68761810 8 + 165556469 TATCCCCT
i mm8.chrX                C 0 C 0
s oryCun1.scaffold_188228      4873 8 -      8342 AGTTCTAA
i oryCun1.scaffold_188228 C 0 C 0
s panTro1.chrX            156403756 3 + 160174553 -----TTA
i panTro1.chrX            I 120 C 0
s bosTau2.scaffold5997         4279 8 -     49150 GGTCCTCA
i bosTau2.scaffold5997    C 0 C 0
s canFam2.chrX            123312605 8 + 126883977 AATTCATA
i canFam2.chrX            C 0 N 0
s echTel1.scaffold_237330     23987 8 +     24714 GGTCTTCA
i echTel1.scaffold_237330 C 0 C 0
s loxAfr1.scaffold_92555        428 8 -      6223 GGTCTTCA
i loxAfr1.scaffold_92555  C 0 C 0
e danRer3.chrNA           102247170 0 + 253521007 I
e monDom4.chrX             34125699 23099 +  60718501 I


	If you have several regions of interest, you can make a custom track 
with all of your regions.  Then intersect your custom track with the 
multiz17way table to get the MAF output for all regions at once.

To create a custom track: 
http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks

To intersect your custom track with the mutliz17way table: 
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection

	This should be enough to get you started.  Be sure to let us know if 
you run into any trouble or have more questions.

Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu




Andrew Smith wrote:
> I've been using the multiz17way alignments, referenced on different species and 
> assemblies. Today I thought I needed to use the "i" lines because I was 
> manipulating the blocks in a way that would require looking to the adjacent 
> blocks and merging them. In particular, I wanted to know if the sequence before 
> or after a particular block is contiguous with the one in the block I'm 
> examining, and your online help for MAF format says this is contained in the 
> information line (starting with "i"). But I can't find these lines in hg18, hg17 
> or mm8. I didn't look at every line of every file, but checked several 
> chromosomes. When the sequences are contiguous, it is usually easy to spot 
> because the end of one block is the start of the next (depending on strand). But 
> this is not always the case, e.g., something may be either deleted, or aligned 
> to another chromosome.
> 
> So are there still "i" lines in the MAF files? And if not, do you have any 
> suggestions?
> 
> If from my question it seems like I don't understand these files, point me to 
> more info please.
> 
> Thanks,
> Andrew
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list