[Genome] BLAT short sequence queries ( 12mer to 30mer )

John Major major at cbio.mskcc.org
Tue Oct 10 08:16:42 PDT 2006


Hi Ann-

Thanks for the suggestions.  I tried those settings, and still am 
returning no hits for my shorter sequences.
One example sequence for D.melanogaster (which does have several exact 
alignments),  is the sequence 'tgtggtgaggaa'.  It does not return any 
Blat hits when I use the suggested settings below. 
Could this be because Blat is screening out repetitive looking sequences?

John


Ann Zweig wrote:

> Hello John,
>
>     I have consulted one of our Blat experts and he has the following 
> to say regarding your question:
>
> s=stepsize, t=tilesize
>
> min guaranteed hit = 2s + t - 1 = 12
>
> To reach 12 bp, he needs
> 2s + t = 12+1 = 13
> if t=9, then s=2
> if t=7, then s=3
> if t=5, then s=4
> stop here because values of t < s do not work.
>
> This all looks like about 2-3x ram usage and 2x more runtime.  The ram 
> usage-factor is 1/s.  You want this small for less ram usage.  So the 
> bigger s is, the better for ram usage.
>
> But there is a sensitivity factor too.  By going to stepsize 5 instead 
> of 11,
> the ram usage more than doubles, but then he can be guaranteed to pick 
> up 2s+t = 21.  One could also increase sensitivity by tweaking the 
> query with variations
> and re-searching the hit-lists.
>
> And there is a performance factor too, the index span is related to 
> tilesize,
> because the bigger the tile, the more patterns the genome is 
> fragmented into,
> so that you make more hit-lists of shorter lengths which helps 
> performance, because with the tiles in your query, you are hitting a 
> smaller number of shorter hit-lists when tilesize increases.
>
>     Keep in mind that there may be a point where you need to search 
> chromosome-by-chromosome if you make your stepsize and tilesize too 
> small.
>
>     This previously-answered mail list question may also be of some 
> help to you: 
> http://www.cse.ucsc.edu/pipermail/genome/2004-September/005612.html
>
>     I hope this is helpful to you.  Feel free to write back to the 
> list if these suggestions don't work for you.
>
> Regards,
>
> ----------
> Ann Zweig
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
>
> John Major wrote:
>
>> Hello-
>>
>> I have a large amount of 12-30bp sequences which I'd like to find 
>> exact (or 1 bp mismatch) alignments to the Drosophila genome.  I have 
>> a local install of BLAT and have played with the parameters, but 
>> can't seem to return any hits for the shorter sequences (in the 
>> 12-15bp range mostly).
>> I have run the s/w w/out the 11.ooc file, and with  tileSize=8, 
>> minMatch=2, stepSize=5, but with no luck.
>> I see in a post from 2002 that BLAT can't handle sequence searches 
>> below 21bp in length.  Is this still true?
>> Are there recommended settings to run BLAT on short sequences?
>>
>> Thanks,
>> John
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
>



More information about the Genome mailing list