[Genome] BLATing probe sequences against human genome

Brooke Rhead rhead at soe.ucsc.edu
Wed Aug 29 18:51:28 PDT 2007


Hello Tim,

You can actually let the -t and -q parameters both default to "dna". 
(Apparently there is not any real difference between -q=dna and -q=rna. 
  It makes a difference when you are using translated BLAT, so -q=dnax 
vs. -q=rnax will give different results).

Since the probes are only 25 bases long, which is near the lower limit 
of the query length to which BLAT can detect matches, you will want to 
use parameters that will maximize BLAT's sensitivity.

One of our developers has come up with a guide for standalone blat and 
gfServer/gfClient to maximize sensitivity for short sequences:

----
If a tile is not marked as overused, then here is the formula for the 
shortest guaranteed exact match:

2 * stepSize + tileSize - 1

With stepSize=5 and tileSize=11 you can find things
as short as 2*5+11-1 = 20 bp.

- stepSize can be from 1 to tileSize.

- tileSize can be from 6 to 15.

- Do not use -fastMap.

- Do not use masking commandline options.

- Use a large value for repMatch, e.g. -repMatch=1000000 to reduce the 
chance of a tile being marked as over-used

- Do not use a .ooc file.

Note that these changes will make BLAT more sensitive, but also make it 
slower and increase memory used. You can do one chromosome at a time to 
reduce memory requirements if needed.

-minScore will not actually go less than 1 or greater than about 
qSize/2.  Therefore use either pslReps or pslCDnaFilter program 
available in the Genome Browser source code to filter for 
size/score/coverage/quality desired.
----

I hope this is helpful.  Please feel free to contact us again if you 
have further questions.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



Tim Gernat wrote:
> Hi!
> 
> I want to use BLAT to align probe sequences from an Affymetrix 
> expression array to the human genome. The goal is to find out which 
> probes potentially detect RNA originating from multiple genes. Each 
> of the probes is described in a file that looks like this ...
> 
> Probe Set Name,Probe X,Probe Y,Probe Interrogation Position,Probe 
> Sequence,Target Strandedness
> 1007_s_at,416,177,3330,CACCCAGCTGGTCCTGTGGATGGGA,Antisense
> 1007_s_at,569,289,3443,GCCCCACTGGACAACACTGATTCCT,Antisense
> ...
> 
> ... and the field "Target Strandedness" always says "Antisense".
> 
> My question is, how do I set the BLAT parameters -t and -q in order 
> to find all genomic loci from which RNA could be produced that can be 
> detected by one of the probes? Currently I set the parameters 
> to  -t=dna and -q=rna, but I'm new to BLAT and not sure whether this is right.
> 
> Thanks in advance for your help!
> 
> Regards,
> Tim
> 
> 
> Tim Gernat
> Genome Biology Group, CMLS
> Lawrence Livermore National Laboratory
> 7000 East Avenue, L-441
> Livermore, CA 94550 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list