[Genome] BLAT Search Genome: hgc XML output?

Hiram Clawson hiram at soe.ucsc.edu
Mon Mar 3 23:17:11 PST 2008


Good Evening John:

This is an interesting problem you present here.

I haven't seen references to XML outputs for blat.
As you are no doubt aware, the fundamental output of
blat is the psl file format:
http://genome.ucsc.edu/FAQ/FAQformat#format2

You can get psl output as one of the options for output in
the blat WEB screen.  If you are running blat locally, you will
also be getting this psl output.  See also, blat usage restrictions:
http://genome.ucsc.edu/FAQ/FAQblat#blat3

At a command line, to get a similar type of ascii output you see
in the hgc click through, we use the command line utility from
the kent source tree: pslPretty
Which can put together a psl output and the two sequences to
produce the side by side alignment picture.
http://genome.ucsc.edu/admin/cvs.html
http://genome.ucsc.edu/admin/jk-install.html

You will find numerous psl manipulation commands in the kent source
tree.  Here is a listing of these commands from our bin directory:
pslCDnaFilter   pslFilterPrimers  pslPairs        pslSimp           pslToPslx
pslCat          pslGlue           pslPartition    pslSort           pslToXa
pslCheck        pslHisto          pslPretty       pslSortAcc        pslUniq
pslCoverage     pslHitPercent     pslQuickFilter  pslSplitOnTarget  pslUnpile
pslDiff         pslIntronsOnly    pslRecalcMatch  pslStats          pslxToFa
pslDropOverlap  pslMap            pslReps         pslSwap
pslFilter       pslMrnaCover      pslSelect       pslToBed

Each command will indicate how to use it if run with no arguments.

I don't know if these will help, but I'm guessing that since the psl
output is the fundamental output from blat, one of these formatting
and filtering commands might get what you want.  You would also need
the .2bit sequence
files for the genomes of interest to work with them locally.
Those can be fetched from the downloads server.

--Hiram

John Woods wrote:
> I've Googled quite a bit for this, but no luck. I also see that 
> BioPython has a BLAT parser which will output XML, which leads me to 
> believe that the answer to my question will be no. But I thought I 
> should ask anyway--before reinventing the wheel.
> 
> So, the BLAT Search Rsults outputs this nice list with links to browser 
> and details. If I click on "details" for a hit, I get cgi-bin/hgc, which 
> highlights--in both the query and the match--the portions of the 
> sequence which align well.
> 
> Is it possible to get XML output for this page? Particularly, I'd like 
> to be able to extract both the aligned an unaligned portions of sequence 
> (from both query and result) in some sort of array or list.
> 
> Alternatively, is there pal2nal functionality for this script?
> 
> And finally, if the answer to both of those is no, any recommendations 
> for parsers? I can't find the docs on BioPython's BLAT parser, sadly.
> 
> Thanks very much!
> 
> Cheers,
> John Woods
> 
> Institute for Cellular and Molecular Biology
> The University of Texas at Austin


More information about the Genome mailing list