[Genome] psl score calculation

Galt Barber galt at soe.ucsc.edu
Thu Feb 8 11:10:59 PST 2007


If it is dna, pslIsProtein() returns false,
so sizeMul=1.  (If it were protein, sizeMul=3).
To keep the scores more comparable, scaling
by 3 for protein makes sense as there are 3 bases
per amino acid codon.  If you know you are
using only dna, you can just treat sizeMul as 1,
or just factor it out.

return psl->match + (psl->repMatch/2)
- psl->misMatch - psl->qNumInsert - psl->tNumInsert;

The repMatch is counted, but only worth half
as much as a regular match that is not in a
repeat-masked area.

The >> operator shifts by the number of bits specified,
dropping bits off the right least significant end.
This is usually like dividing by two and ignoring
any remainder.   14 >> 1 == 7.  15 >> 1 == 7.

-Galt


On Thu, 8 Feb 2007, Yuval Itan wrote:

> Hello,
>
> I need to calculate psl score for my Blat hits, and I was trying to get
> the logic behind the calculation. I understand that this is the
> function for score calculation:
>
> int pslScore(const struct psl *psl)
> /* Return score for psl. */
> {
> int sizeMul = pslIsProtein(psl) ? 3 : 1;
>
> return sizeMul * (psl->match + ( psl->repMatch>>1)) -
>           sizeMul * psl->misMatch - psl->qNumInsert - psl->tNumInsert;
> }
>
> I was wondering what's the reason for psl->repMatch>>1 , isn't it like
> dividing repMatch by 2?
> Also, is the sizeMul element essential for the calculation if I use DNA
> sequences?
>
> Thank you very much,
>
> Yuval
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list