[Genome] BLATing probe sequences against human genome
Brooke Rhead
rhead at soe.ucsc.edu
Wed Aug 29 18:51:28 PDT 2007
Hello Tim,
You can actually let the -t and -q parameters both default to "dna".
(Apparently there is not any real difference between -q=dna and -q=rna.
It makes a difference when you are using translated BLAT, so -q=dnax
vs. -q=rnax will give different results).
Since the probes are only 25 bases long, which is near the lower limit
of the query length to which BLAT can detect matches, you will want to
use parameters that will maximize BLAT's sensitivity.
One of our developers has come up with a guide for standalone blat and
gfServer/gfClient to maximize sensitivity for short sequences:
----
If a tile is not marked as overused, then here is the formula for the
shortest guaranteed exact match:
2 * stepSize + tileSize - 1
With stepSize=5 and tileSize=11 you can find things
as short as 2*5+11-1 = 20 bp.
- stepSize can be from 1 to tileSize.
- tileSize can be from 6 to 15.
- Do not use -fastMap.
- Do not use masking commandline options.
- Use a large value for repMatch, e.g. -repMatch=1000000 to reduce the
chance of a tile being marked as over-used
- Do not use a .ooc file.
Note that these changes will make BLAT more sensitive, but also make it
slower and increase memory used. You can do one chromosome at a time to
reduce memory requirements if needed.
-minScore will not actually go less than 1 or greater than about
qSize/2. Therefore use either pslReps or pslCDnaFilter program
available in the Genome Browser source code to filter for
size/score/coverage/quality desired.
----
I hope this is helpful. Please feel free to contact us again if you
have further questions.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Tim Gernat wrote:
> Hi!
>
> I want to use BLAT to align probe sequences from an Affymetrix
> expression array to the human genome. The goal is to find out which
> probes potentially detect RNA originating from multiple genes. Each
> of the probes is described in a file that looks like this ...
>
> Probe Set Name,Probe X,Probe Y,Probe Interrogation Position,Probe
> Sequence,Target Strandedness
> 1007_s_at,416,177,3330,CACCCAGCTGGTCCTGTGGATGGGA,Antisense
> 1007_s_at,569,289,3443,GCCCCACTGGACAACACTGATTCCT,Antisense
> ...
>
> ... and the field "Target Strandedness" always says "Antisense".
>
> My question is, how do I set the BLAT parameters -t and -q in order
> to find all genomic loci from which RNA could be produced that can be
> detected by one of the probes? Currently I set the parameters
> to -t=dna and -q=rna, but I'm new to BLAT and not sure whether this is right.
>
> Thanks in advance for your help!
>
> Regards,
> Tim
>
>
> Tim Gernat
> Genome Biology Group, CMLS
> Lawrence Livermore National Laboratory
> 7000 East Avenue, L-441
> Livermore, CA 94550
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list