[Genome] blat not finding entire protein sequence

Archana Thakkapallayil archanat at soe.ucsc.edu
Mon May 21 14:43:23 PDT 2007


Hello Marcos,

I have consulted one of our Blat experts and he has the following to say 
regarding your question:

BLAT does not attempt to find and align split codons.  So the parts of 
the Q codon would be lost off the ends of the intron junction between 
two exons. That leaves blat with having to find one little "I" codon 
several thousand bases away. It can't do that. It would be difficult for 
any alignment software, not just BLAT.

One way to help blat, would be to give it the full protein sequence from 
the gene CR1. Blat does not always require a complete independent 
seeding for each exon, it can probably extend through small gaps, but 
apparently this intron is too large for that.

Note that even though BLAST is more sensitive, it too has limits for 
picking up tiny trailing exons.

For this kind of problem, mRNA would align better because it would not 
stop suddenly at the end of the CDS but keep going for a while more in 
3' UTR which could help pick up the last exon.

I hope this information helps.  Please let us know if you have any 
further questions.

Regards,

Archana
UCSC Genome Bioinformatics Group

We invite you to give us your feedback on the UCSC Genome Browser 
through May 31, 2007: http://www.surveymonkey.com/s.asp?U=881163743177

Marcos H Woehrmann wrote:
> I'm using blat to search for the following protein 
> sequence in the March 2006 assembly of the Human Genome:
>
> SCDDFLGQLPHGRVLLPLNLQLGAKVSFVCDEGFRLKGRSASHCVLAGMKALWNSSVPVCEQI
>
> The UCSC genome browser returns a 100% hit but only for 
> the first 61 characters:
>
> 182  1 61  63 100.0%  1  ++  205851912 205854456  2545
>
> However, when I look at the sequence in the browser the 
> final two amino acids, Q and I, are there.  There is an 
> intron located within the Q codon, but there is another 
> intron in this squence which is handled correctly.
>
> Are introns near the end of the query sequence a problem 
> for blat?
>
> marcos
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   



More information about the Genome mailing list