[Genome] aligning short sequences with BLAT

Galt Barber galt at soe.ucsc.edu
Mon Jan 28 15:38:22 PST 2008


Hi Robert,

I got this to work:

blat -t=dna -q=dna -tileSize=6 -stepSize=3 -minMatch=1 -repMatch=1000000
-minScore=0 -minIdentity=80 -noHead -out=psl database.fa query.fa
output.psl

Basically it was necessary to remove -fine and
to add -minScore=0.   (minScore seems to need to be 11 or less).

Some users report -fine as finding extra stuff,
but maybe it also loses some other stuff.

In any case, Jim Kent, the author of BLAT, is working
on a new short exact-match program which might be helpful
if/when it's released.

What you are doing here is stretching it to its absolute limits
of sensitivity, plus you are using zillions of tiny targets.

If both of these sets are from a genome,
another approach might be to blat the sets
to the genome, and then see where their
alignment coordinates overlap to create
a match between the sets.

-Galt


On Mon, 28 Jan 2008, Robert Hunter wrote:

> Hello.  I'm having difficulty with using standalone BLAT ( version 3.4
> ) to align short DNA sequences.  My targets are 30-mers and the
> queries are 11-mers.  Matches should be anything over 80% identity.  I
> followed the guidelines listed at the BLAT FAQ, for "Using Blat for
> short sequences with maximum sensitivity," however I am still unable
> to produce matches, even with exact queries.
>
> E.g,
>
> I have database.fa with a single entry:
>
> >TEST1|offset|123|
> TACTGGATTCCGAGACCACACGCGTCGTAG
>
>
> ...and a query.fa with a single entry:
>
> >TEST2|offset|456|
> CCGAGACCACA
>
>
> Using the following command, I expect that I should get a match in
> output.psl, however BLAT returns no results (an empty output file).
>
> blat -t=dna -q=dna -fine -tileSize=6 -stepSize=3 -minMatch=1
> -repMatch=1000000 -noHead -out=psl database.fa query.fa output.psl
>
> According to the FAQ, a guarantied match should occur when the query size is:
>  2 * stepSize + tileSize - 1
>
> Am I doing something wrong?  Any suggestions would be greatly appreciated.
>
> --
> RH
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list