[Genome] BLAT client/server and parameters

agarwal at cshl.edu agarwal at cshl.edu
Sat Jul 28 12:35:33 PDT 2007


Hello,

I just have a few general questions about BLAT:

1)  Is it possible to use fastMap, the 11.ooc, and masking options of a 
2bit database file with gfClient, or are these parameters only 
restricted to command line BLAT?  Are there any speed advantages in 
using nib vs. 2bit databases?

2)  If I use a 2bit database (~800 Mb) of the human genome, why does it 
take up >3.1Gb memory when sending a query of 1000 sequences, while with 
gfServer it only uses ~1Gb memory?  If I upload the database from 
command line, it reports that I have 33 sequences in the database, while 
in reality I generated the 2bit file using the faToTwoBit command with 
only my 24 chromosomes in fasta format.

3)  Why are the results different when I output in blast8 vs. psl 
format?  For most input sequences, I get many more hits with blast8.  If 
I change the minScore and minIdentity while outputting in blast8 format, 
the output remains exactly the same.  Why is this so?  For blast8 
format, why do I recieve sequences with ~70% homology when the default 
for minIdentity is 90%?  I also get a lot of spurious results of reads 
matching <10 bases with 100% homology in the results.  How to I block 
out results below a certain length?

Your help is much appreciated,
Vikram Agarwal


More information about the Genome mailing list