[Genome] 2 questions about BLAT
Rachel Harte
hartera at soe.ucsc.edu
Sat Jun 2 11:31:11 PDT 2007
Hello Christian,
Here are the answers to your questions:
1) There are a couple of programs that you can use to post-process the
BLAT output and select alignments where the alignment must coverage the
whole length.
You can obtain these programs by downloading the Genome Browser source
tree:
http://genome.ucsc.edu/FAQ/FAQlicense#license3
It is free for non-profit, academic and personal use.
pslReps and pslCDnaFilter both use -minCover so you could set this to 1.0
to get the whole of the query aligned. pslReps is in the directory
src/hg/pslReps/ and pslCDnaFilter is in src/hg/pslCDnaFilter/.
Is that what you want? This would return only alignments where the whole
of the query is aligned. I wasn't sure if this is what you want or if you
want the percentage ID recalculated to include the whole query sequence.
If you would like to recalculate the percentag identity in this way, then
you will need to write your own program to do this. Here is a link to
the C code that calculates the percent identity:
http://genome.ucsc.edu/FAQ/FAQblat#blat4
I also have as script in Perl that calculates the percent identity if you
would prefer that.
2) There is a program called pslPretty in the src/hg/pslPretty directory
that will convert the PSL file from BLAT into a human readable alignment.
This requires fasta, 2bit or nib files which can be obtained through the
downloads server at:
http://hgdownload.cse.ucsc.edu/
You can use the following programs to convert the PSL BLAT output file to
a format which can then be converted to a FASTA file:
src/utils/pslToBed
src/utils/bedToGenePred
src/hg/genePredToMrna
They should be run in this order:
pslToBed -> bedToGenePred -> genePredToMrna
For the last program, you will need to use our databases. You can do this
by using our public MySQL server and by setting up a .hg.conf file in your
home directory with the settings as show in this FAQ:
http://genome.ucsc.edu/FAQ/FAQdownloads#download29
The result will be the genomic sequence to which your query sequence
aligned i.e. the target region from the PSL file.
I hope that this helps you. Please let us know if you have further
questions.
Rachel
Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
On Thu, 31 May 2007, Christian Arnold wrote:
> Hello,
>
> I am using BLAT for research and I have two questions regarding the
> configuration of BLAT
>
> 1) Is there a way to define that the entered sequence has to be found in
> the whole length? Often, I have alignments where the first X bases are
> ignored because they did not had an significant alignment. But the
> identity score ignores these parts which is not what I want to have. The
> Identity number should include the whole sequence.
>
> 2) Is there an easy way to get the aligned sequence (not the one which I
> typed in) as a fasta file in its complete length?
>
> I read the Manual but I didn't found anything, so it would be nice if
> you can help me with these issues...
>
>
> best
> Christian
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list