[Genome] aligning short sequences with BLAT

Hiram Clawson hiram at soe.ucsc.edu
Tue Jan 29 09:25:05 PST 2008


Good Morning Robert:

You are working with such small sequences, you might want to
do a statistical analysis instead of trying to find matches.

For example, working with sequences of length 11 implies that
for any given 11-mer, it would by chance be found 715 times
in a random genome of length 3 billion.  (3 billion / 4^11)

Of course 11-mers are not random in a genome, thus you might want
to characterize all possible 11-mers in a genome by counting
up a histogram of them all, then compare that histogram with
your query 11-mers to see how common they would be in the genome.
If you are considering mis-matches, that makes your query sequences
even more common in your target sequence.

There is a simple search tool in the kent source tree:
	findMotif
that can find exact matches for sequences of length 4 to 16 bases.
It is a simple moving window of the query sequence over the target.

--Hiram

Robert Hunter wrote:
> At this point, I am starting to look into other methods of producing
> very large numbers of alignments between very small targets and even
> smaller queries.  There is much literature on the subject.  Would you
> have any suggestions of where to start?


More information about the Genome mailing list