[Genome] Blat question

Galt Barber galt at soe.ucsc.edu
Fri Jan 25 11:24:20 PST 2008


Because the UCSC web site does not use any explicit
masking at all for dna (gfServer does have a default
repMatch which will essentially mask out over-used tiles
so that they don't participate in a hit), we don't
expect your use of both query and target masking to return identical
results.  So, this is not surprising.

FYI, for standalone blat,
where target genome masking is required, we generally make
and use a .ooc file, especially if the queries will be
run many times.

I have sent a note to Jim Kent the author
about the negative qInsert value in column 6.
I suspect it would not be there if -qMask option
were left out.

We don't really use the "HSP" terminology when discussing BLAT.
We have something sort of analogous just in tile index hits
that are on the same diagonal (i.e. tpos - qpos = constant).
It defaults to requiring two diagonal hits with an insertion
or deletion of up to 2 bp.  These things can be tweaked by
commandline settings, but the defaults work well.

BLAST only gives exons, but BLAT tries to chain exons together
to make a complete gene or rna alignment.  You can force
BLAT to not chain by specifying -out=blast etc, although
the psl output format is probably better for many purposes.
Because it can return multiple psl matches that are
alignments, it may be that some of these alignments
overlap.  This is good and even useful.  You can use
various utilities for filtering your psl files.
pslCDnaFilter and pslReps are examples of these utilities.

Please see the BLAT FAQ
 http://genome.ucsc.edu/FAQ/FAQblat

-Galt


On Fri, 25 Jan 2008, Wes Barris wrote:

> Hi,
>
> I have found a number of cases where stand-alone blat is returning
> curious results.  If I blat a masked version of NCBI sequence EE753431
> against the latest (2006) version of the bovine genome on the UCSC
> web site, the best hit is against BTA9.
>
> If I blat this same sequence using the stand-alone software (v34), the best
> hit is against BTA7 and the psl file contains overlapping HSPs.  Here
> is the command line that I used and the psl result:
>
> blat db.fa query.fa -t=dna -q=dna -qMask=lower -mask=lower -minIdentity=94 junk.psl
>
> 402     23      0       0       1       -93     2       986136  +       EE753431    436     2       334     BTA7.nib:86197000-87184000 987000 17       986578  3       84,192,149,     2,86,185,
> 17,460266,986429,
>
> Notice the negative number in column 6.  Is this expected?
>
> I have attached a small portion of BTA9 and the query.
> --
> Wes Barris <wes.barris at csiro.au>
>

>EE753431 cluster=30072
GCgagtcccttggactgcaaggagatccaacctgtccatcctaaaggagaccagtcctgg
gtgttcattggaaggactgatgctgaggctgaaactccagtactttggccacctcatgcg
acgagttgacttattggaaaagactctgatgctgggagggattgGGGGCAGGAGGAGAAG
GGGACGACAGAAGATAAGATGGCTGGATGGCATCACTGACTCGATGCACATGAGTTTGGG
TGAACTCCGGGAGTTGGTGatggacagggaggcctggcgtgctacgattcatgggatcac
aaagagttggacatgactgagcaactGAACTGAATTGAACTAAACTCTAGGTGGTGTCCA
GCATTGCTTAACAGTCTGTGCACATGCTACATAAATGTGATACTGTTTGTTTTTTTAAAA
ATAATTTTACTTCTTT




More information about the Genome mailing list