[Genome] blat not finding entire protein sequence
Archana Thakkapallayil
archanat at soe.ucsc.edu
Mon May 21 14:43:23 PDT 2007
Hello Marcos,
I have consulted one of our Blat experts and he has the following to say
regarding your question:
BLAT does not attempt to find and align split codons. So the parts of
the Q codon would be lost off the ends of the intron junction between
two exons. That leaves blat with having to find one little "I" codon
several thousand bases away. It can't do that. It would be difficult for
any alignment software, not just BLAT.
One way to help blat, would be to give it the full protein sequence from
the gene CR1. Blat does not always require a complete independent
seeding for each exon, it can probably extend through small gaps, but
apparently this intron is too large for that.
Note that even though BLAST is more sensitive, it too has limits for
picking up tiny trailing exons.
For this kind of problem, mRNA would align better because it would not
stop suddenly at the end of the CDS but keep going for a while more in
3' UTR which could help pick up the last exon.
I hope this information helps. Please let us know if you have any
further questions.
Regards,
Archana
UCSC Genome Bioinformatics Group
We invite you to give us your feedback on the UCSC Genome Browser
through May 31, 2007: http://www.surveymonkey.com/s.asp?U=881163743177
Marcos H Woehrmann wrote:
> I'm using blat to search for the following protein
> sequence in the March 2006 assembly of the Human Genome:
>
> SCDDFLGQLPHGRVLLPLNLQLGAKVSFVCDEGFRLKGRSASHCVLAGMKALWNSSVPVCEQI
>
> The UCSC genome browser returns a 100% hit but only for
> the first 61 characters:
>
> 182 1 61 63 100.0% 1 ++ 205851912 205854456 2545
>
> However, when I look at the sequence in the browser the
> final two amino acids, Q and I, are there. There is an
> intron located within the Q codon, but there is another
> intron in this squence which is handled correctly.
>
> Are introns near the end of the query sequence a problem
> for blat?
>
> marcos
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list