[Genome] BLAT short sequence queries ( 12mer to 30mer )

Ann Zweig ann at soe.ucsc.edu
Fri Oct 6 13:37:38 PDT 2006


Hello John,

	I have consulted one of our Blat experts and he has the following to say 
regarding your question:

s=stepsize, t=tilesize

min guaranteed hit = 2s + t - 1 = 12

To reach 12 bp, he needs
2s + t = 12+1 = 13
if t=9, then s=2
if t=7, then s=3
if t=5, then s=4
stop here because values of t < s do not work.

This all looks like about 2-3x ram usage and 2x more runtime.  The ram 
usage-factor is 1/s.  You want this small for less ram usage.  So the bigger s 
is, the better for ram usage.

But there is a sensitivity factor too.  By going to stepsize 5 instead of 11,
the ram usage more than doubles, but then he can be guaranteed to pick up 2s+t = 
21.  One could also increase sensitivity by tweaking the query with variations
and re-searching the hit-lists.

And there is a performance factor too, the index span is related to tilesize,
because the bigger the tile, the more patterns the genome is fragmented into,
so that you make more hit-lists of shorter lengths which helps performance, 
because with the tiles in your query, you are hitting a smaller number of 
shorter hit-lists when tilesize increases.

	Keep in mind that there may be a point where you need to search 
chromosome-by-chromosome if you make your stepsize and tilesize too small.

	This previously-answered mail list question may also be of some help to you: 
http://www.cse.ucsc.edu/pipermail/genome/2004-September/005612.html

	I hope this is helpful to you.  Feel free to write back to the list if these 
suggestions don't work for you.

Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu



John Major wrote:
> Hello-
> 
> I have a large amount of 12-30bp sequences which I'd like to find exact 
> (or 1 bp mismatch) alignments to the Drosophila genome.  I have a local 
> install of BLAT and have played with the parameters, but can't seem to 
> return any hits for the shorter sequences (in the 12-15bp range mostly).
> I have run the s/w w/out the 11.ooc file, and with  tileSize=8, 
> minMatch=2, stepSize=5, but with no luck. 
> 
> I see in a post from 2002 that BLAT can't handle sequence searches below 
> 21bp in length.  Is this still true?
> Are there recommended settings to run BLAT on short sequences?
> 
> Thanks,
> John
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list