[Genome] BLAT short sequence queries ( 12mer to 30mer )
Ann Zweig
ann at soe.ucsc.edu
Fri Oct 6 13:37:38 PDT 2006
Hello John,
I have consulted one of our Blat experts and he has the following to say
regarding your question:
s=stepsize, t=tilesize
min guaranteed hit = 2s + t - 1 = 12
To reach 12 bp, he needs
2s + t = 12+1 = 13
if t=9, then s=2
if t=7, then s=3
if t=5, then s=4
stop here because values of t < s do not work.
This all looks like about 2-3x ram usage and 2x more runtime. The ram
usage-factor is 1/s. You want this small for less ram usage. So the bigger s
is, the better for ram usage.
But there is a sensitivity factor too. By going to stepsize 5 instead of 11,
the ram usage more than doubles, but then he can be guaranteed to pick up 2s+t =
21. One could also increase sensitivity by tweaking the query with variations
and re-searching the hit-lists.
And there is a performance factor too, the index span is related to tilesize,
because the bigger the tile, the more patterns the genome is fragmented into,
so that you make more hit-lists of shorter lengths which helps performance,
because with the tiles in your query, you are hitting a smaller number of
shorter hit-lists when tilesize increases.
Keep in mind that there may be a point where you need to search
chromosome-by-chromosome if you make your stepsize and tilesize too small.
This previously-answered mail list question may also be of some help to you:
http://www.cse.ucsc.edu/pipermail/genome/2004-September/005612.html
I hope this is helpful to you. Feel free to write back to the list if these
suggestions don't work for you.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
John Major wrote:
> Hello-
>
> I have a large amount of 12-30bp sequences which I'd like to find exact
> (or 1 bp mismatch) alignments to the Drosophila genome. I have a local
> install of BLAT and have played with the parameters, but can't seem to
> return any hits for the shorter sequences (in the 12-15bp range mostly).
> I have run the s/w w/out the 11.ooc file, and with tileSize=8,
> minMatch=2, stepSize=5, but with no luck.
>
> I see in a post from 2002 that BLAT can't handle sequence searches below
> 21bp in length. Is this still true?
> Are there recommended settings to run BLAT on short sequences?
>
> Thanks,
> John
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list