[Genome] BLAT Search Genome: hgc XML output?
Hiram Clawson
hiram at soe.ucsc.edu
Mon Mar 3 23:17:11 PST 2008
Good Evening John:
This is an interesting problem you present here.
I haven't seen references to XML outputs for blat.
As you are no doubt aware, the fundamental output of
blat is the psl file format:
http://genome.ucsc.edu/FAQ/FAQformat#format2
You can get psl output as one of the options for output in
the blat WEB screen. If you are running blat locally, you will
also be getting this psl output. See also, blat usage restrictions:
http://genome.ucsc.edu/FAQ/FAQblat#blat3
At a command line, to get a similar type of ascii output you see
in the hgc click through, we use the command line utility from
the kent source tree: pslPretty
Which can put together a psl output and the two sequences to
produce the side by side alignment picture.
http://genome.ucsc.edu/admin/cvs.html
http://genome.ucsc.edu/admin/jk-install.html
You will find numerous psl manipulation commands in the kent source
tree. Here is a listing of these commands from our bin directory:
pslCDnaFilter pslFilterPrimers pslPairs pslSimp pslToPslx
pslCat pslGlue pslPartition pslSort pslToXa
pslCheck pslHisto pslPretty pslSortAcc pslUniq
pslCoverage pslHitPercent pslQuickFilter pslSplitOnTarget pslUnpile
pslDiff pslIntronsOnly pslRecalcMatch pslStats pslxToFa
pslDropOverlap pslMap pslReps pslSwap
pslFilter pslMrnaCover pslSelect pslToBed
Each command will indicate how to use it if run with no arguments.
I don't know if these will help, but I'm guessing that since the psl
output is the fundamental output from blat, one of these formatting
and filtering commands might get what you want. You would also need
the .2bit sequence
files for the genomes of interest to work with them locally.
Those can be fetched from the downloads server.
--Hiram
John Woods wrote:
> I've Googled quite a bit for this, but no luck. I also see that
> BioPython has a BLAT parser which will output XML, which leads me to
> believe that the answer to my question will be no. But I thought I
> should ask anyway--before reinventing the wheel.
>
> So, the BLAT Search Rsults outputs this nice list with links to browser
> and details. If I click on "details" for a hit, I get cgi-bin/hgc, which
> highlights--in both the query and the match--the portions of the
> sequence which align well.
>
> Is it possible to get XML output for this page? Particularly, I'd like
> to be able to extract both the aligned an unaligned portions of sequence
> (from both query and result) in some sort of array or list.
>
> Alternatively, is there pal2nal functionality for this script?
>
> And finally, if the answer to both of those is no, any recommendations
> for parsers? I can't find the docs on BioPython's BLAT parser, sadly.
>
> Thanks very much!
>
> Cheers,
> John Woods
>
> Institute for Cellular and Molecular Biology
> The University of Texas at Austin
More information about the Genome
mailing list