[Genome] question about BLAT
Brooke Rhead
rhead at soe.ucsc.edu
Fri Mar 9 14:58:10 PST 2007
Hello Jasmin,
This is an interesting question, as the score for each sequence would be
exactly the same, as calculated by the formula listed on our FAQ page.
---------------------------------------------------------------
The score calculation is generated by the following function:
int pslScore(const struct psl *psl)
/* Return score for psl. */
{
int sizeMul = pslIsProtein(psl) ? 3 : 1;
return sizeMul * (psl->match + ( psl->repMatch>>1)) -
sizeMul * psl->misMatch - psl->qNumInsert - psl->tNumInsert;
}
---------------------------------------------------------------
This is basically (#matches + #repMatches) - #misMatches - #query gaps -
#target gaps. That is, score is calculated by comparing matches to
mismatches and penalizing for gaps on either side of the alignment
(although it is a very small penalty). The query size does not affect
the score.
Interestingly, if we try trimming some of the bases off the left side of
sequence 1, sometimes we get an alignment in the chr10 location, with a
score of 21 (as would be expected).
So, something mysterious is happening that sometimes prevents the second
alignment for sequence 1 from showing up in chr10 location (the poorer
alignment location).
One of the developers here is looking into why this is happening. I
will let you know what we discover.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Jasmin Roohi wrote:
> Hello UCSC Genome Browser:
>
> I've got 2 sequences. Sequence 1 is 90bp long. Sequence 2 is 45bp long and
> is identical to the last 45 bases of Sequence 1. Why is it that when I BLAT
> Sequence 1, I only get 1 hit to the genome but when I BLAT Sequence 2, I get
> 2.
>
> Here are the sequences.
>
> Sequence 1:
> TTTTAAGGTGTAATGTGATATGCTGCAGTTAAGGCACCGTGGTACAGTGAATGAAAGATATGGTGATTCTGAGAAGAGAATCAGAGAAGC
>
> Sequence 2:
> AGTGAATGAAAGATATGGTGATTCTGAGAAGAGAATCAGAGAAGC
>
>
> Thanks for your help.
>
> Best,
> Jasmin
More information about the Genome
mailing list