[Genome] maf2fasta

Angie Hinrichs angie at soe.ucsc.edu
Tue Jun 19 10:29:11 PDT 2007


Hi Navin,

maf2fasta is actually from Webb Miller's group at PSU, so I suggest 
you contact them -- http://www.bx.psu.edu/miller_lab/ has a contacts 
link.

I turned the illustrative test case in your email into actual files 
(attached) and ran maf2fasta, but the output appeared correct:

% maf2fasta /tmp/tmpRef.fa /tmp/tmp.maf fasta
>A
AAATTTGGG
>B
AAATTTGGG
>C
AAA---GGG
>D
AAATTTGGG

-- so unfortunately, those inputs are too straightforward to provoke 
the bug you are seeing.  In order to help the Miller group debug, I 
would suggest isolating the smallest input sequence where you see the 
bug, and sending that (including reference fasta) to them, so they can 
reproduce your bug and step through the code locally.

Hope that helps,

Angie

On Tue, 19 Jun 2007, Navin Elango wrote:

> Dear UCSC members,
> 
> I tried to use maf2fasta on a four species alignment and found the following
> problem.
> 
> Lets say that the four species are A,B,C,D and the MAF file looks like the
> following (I might be messing up the one-based or zero-based index, but the
> point is that in the second block species C is missing)
> 
> a score=1234
> s A 1 3 + 9 AAA
> s B 1 3 + 9 AAA
> s C 1 3 + 6 AAA
> s D 1 3 + 9 AAA
> 
> a score=2345
> s A 4 3 + 9 TTT
> s B 4 3 + 9 TTT
> s D 4 3 + 9 TTT
> 
> s A 7 3 + 9 GGG
> s B 7 3 + 9 GGG
> s C 7 3 + 6 GGG
> s D 7 3 + 9 GGG
> 
> When I do maf2fasta, the output is
> 
> >A
> AAATTTGGG
> >B
> AAATTTGGG
> >C
> AAATTTGGG
> >D
> AAA---GGG
> 
> The gap which is supposed to be in the third species is actually found in the
> fourth species.
> 
> Am I missing something? Do I have to process the maf file before I send it
> into maf2fasta? Could you please let me know how to fix this.
> 
> Any help will be greatly appreciated!
> 
> Thanks,
> Navin.
> 
> 
> 
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
> 
-------------- next part --------------
>A
AAATTTGGG
-------------- next part --------------
##maf version=1 scoring=blastz
a score=12.0
s A 0 3 + 9 AAA
s B 0 3 + 9 AAA
s C 0 3 + 6 AAA
s D 0 3 + 9 AAA

a score=9.0
s A 3 3 + 9 TTT
s B 3 3 + 9 TTT
s D 3 3 + 9 TTT

a score=12.0
s A 6 3 + 9 GGG
s B 6 3 + 9 GGG
s C 3 3 + 6 GGG
s D 6 3 + 9 GGG



More information about the Genome mailing list