[Genome] aligning short sequences with BLAT
Hiram Clawson
hiram at soe.ucsc.edu
Tue Jan 29 09:25:05 PST 2008
Good Morning Robert:
You are working with such small sequences, you might want to
do a statistical analysis instead of trying to find matches.
For example, working with sequences of length 11 implies that
for any given 11-mer, it would by chance be found 715 times
in a random genome of length 3 billion. (3 billion / 4^11)
Of course 11-mers are not random in a genome, thus you might want
to characterize all possible 11-mers in a genome by counting
up a histogram of them all, then compare that histogram with
your query 11-mers to see how common they would be in the genome.
If you are considering mis-matches, that makes your query sequences
even more common in your target sequence.
There is a simple search tool in the kent source tree:
findMotif
that can find exact matches for sequences of length 4 to 16 bases.
It is a simple moving window of the query sequence over the target.
--Hiram
Robert Hunter wrote:
> At this point, I am starting to look into other methods of producing
> very large numbers of alignments between very small targets and even
> smaller queries. There is much literature on the subject. Would you
> have any suggestions of where to start?
More information about the Genome
mailing list