[Genome] blat_help

Galt Barber galt at soe.ucsc.edu
Tue May 29 17:57:41 PDT 2007


For some reason, using -fine helps BLAT pick up a bunch of your queries,
e.g.
>1_mismatch_A
aacgtgttttcgggtgtaggaggatcgc
cacgtgtttttgggtgtaggaggaacgg  (genomic for comparison 4 mismatches)

blat -fine -stepSize=1 -tileSize=6 -minMatch=1 -minScore=0 -minIdentity=0
db.fa query.fa out.psl

Even with all these settings making BLAT be at it's most sensitive,
it still misses some.

The answer is that you are just going beyond BLAT's sensitivity.

For instance, we do not use blat for instance in our cross-species chains.
We use blastz.  We do use blat for aligning ESTs and mRNA where you have
fairly long stretches of very high identity.  We can use it for making
liftOver files between two assemblies in the same species,
e.g. hg17 to hg18.

Many of the examples you give have many mismatches and the
runs of identical bases are very short.  If you need blat
to pick up every example that you give, then it's just not
going to work.  And those sequences will be hard for any
search engine.  Although BLAST is more sensitive, I can't
guarantee that it could find every one of your example queries.

Good luck!

-Galt

> ---------- Forwarded message ----------
> Date: Tue, 29 May 2007 09:03:07 +0200
> From: Viviana Piccolo <viviana.piccolo at unimi.it>
> To: genome at soe.ucsc.edu, genome at soe.ucsc.edu
> Subject: [Genome] blat_help
>
> Hi!
> I've tried to run the blat with this parameters -tileSize=8 -
> stepSize=4 -minMatch=1 -minScore=19 as your developers suggested me, but I
> can't find all hits I want.
> This is my database:
> >db
> catgactattgcactaaaggtgcgcacacgtgtttttgggtgtaggaggaacggacgccacccaccaggtagctatactctcccccaggtattaccattagggagggagaaaaaccatagtcgtaggttgcgtgca
>
> And this is my query:
> >seq_original
> cgtgtttttgggtgtaggagg
> >1_mismatch_A
> aacgtgttttcgggtgtaggaggatcgc
> >1_mismatch_B
> aaaacgtgcttttgggtgtaggaggtaaaa
> >1_mismatch_C
> aaagggggggcgtgattttgggtgtaggaggggtggg
> >2mism_A
> ttaccaggcgtgtatttgggcgtaggacgttttaaccacctggg
> >2mism_B
> aatgtgggggcgtgtttatgggtgtaggaagttttg
> >2mism_far
> aaacgtgtttttgggcgtaggaagttttcaggg
> >3mism
> gcaaaaaaaaacgtgtttatgggtataggaagtttcaaactttttttttttacaaa
> >1_mism_+_1mism_near
> ggggggggggcggtcgtgtttttgggagtcggaggagctatatt
> >1_mism_+_1mism_far_6_nts
> acaccgggggggggcgtgtttttgggagtaggaagttat
> >3mism_attached
> aacaggggggacgtgtttttgaaagtaggaggttatcctt
> >1mism_3mism_attached
> gtgtcccccgtgtctttgggtcccggaggtttt
> >5_mismatch
> atcaaataacgtatctctgggtatgggaggactttt
>
> This is the BLAT output for parameters -tileSize=8 -stepSize=4 -
> minMatch=1 -minScore=19 that you suggested me:
>
>
> psLayout version 3
>
> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap
> strand  Q               Q       Q       Q       T               T
> T       T       block   blockSizes      q
> Starts  tStarts
>         match   match           count   bases   count
> bases           name            size    start   end     name
> size    start   end     count
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> 21      0       0       0       0       0       0       0
> +       seq_original    21      0       21      db_con_gap      136
> 28      49      1       21,     0,      2
> 8,
> 21      1       0       0       0       0       0       0
> +       1_mismatch_B    30      3       25      db_con_gap      136
> 27      49      1       22,     3,      2
> 7,
> 20      1       0       0       0       0       0       0
> +       1_mismatch_C    37      10      31      db_con_gap      136
> 28      49      1       21,     10,     2
> 8,
>
>
> I'm looking for hits of minimus 19 nucleotides, but I've tried to low
> the scoreSize to 15 and using the parameters -tileSize=6 -stepSize=5 -
> minScore=15 -minIdentity=20 -minMatch=0 -out=pslx -noHead   and I can
> see that I can find more hits, but not all aI want...and I've seen some
> stranges situation. I can show you my Blat output:
>
> psLayout version 3
>
> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap
> strand  Q               Q       Q       Q       T               T
> T       T       block   blockSizes      q
> Starts  tStarts
>         match   match           count   bases   count
> bases           name            size    start   end     name
> size    start   end     count
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> 21      0       0       0       0       0       0       0
> +       seq_original    21      0       21      db_con_gap      136
> 28      49      1       21,     0,      2
> 8,
> 21      1       0       0       0       0       0       0
> +       1_mismatch_B    30      3       25      db_con_gap      136
> 27      49      1       22,     3,      2
> 7,
> 20      1       0       0       0       0       0       0
> +       1_mismatch_C    37      10      31      db_con_gap      136
> 28      49      1       21,     10,     2
> 8,
> 18      1       0       0       0       0       0       0
> +       2mism_far_B     36      10      29      db_con_gap      136
> 28      47      1       19,     10,     2
> 8,
> 15      0       0       0       0       0       0       0
> +       2mism_near_A    31      16      31      db_con_gap      136
> 34      49      1       15,     16,     3
> 4,
> 19      2       0       0       0       0       0       0
> +       2mism_near_B    21      0       21      db_con_gap      136
> 28      49      1       21,     0,      2
> 8,
>
> As you see, in the list appear two hits that contains a mismatch
> ("1_mismatch_C" and "1_mismatch_B")  but I continue missing the hit
> "1_mismatch_A" that contains a mismatch too.
> I've considered others combinations, for example I've taken into
> account the possibility that two or more mismatches could be  near
> (distance between mismatches= 1-2 nt  in "2mism_near_A",
> "2mism_near_B")  or  far (distance between mismatches= 5-6 nt in
> "2mism_far_A", " 2mism_far_B" , "2mism_far_C" ) or the distance between
> mismatches is equal to 0 nt  (in and "2mism_attached" and
> "3mism_attached" ).
> I've controlled situations with more than two mismatches (in
> "3mism_far" and "5_mismatch") and in mixed situations (in
> "1mism_3mism_attached"). I can't obtain a BLAT output that respect my
> conditions. I miss the hit 2mism_far_A and 2mism_far_C (while I've the
> 2mism_far_B...why?), I miss " 2mism_near_B" (while I have in the output
> " 2mism_near_A"...). I miss completely all cases of mismatches > of 2
> ("3mism_far" , "5_mismatch")  and cases in whitch mismatches have a
> distance of 0 nt (attached mismatches in "2mism_attached",
> "3mism_attached") and the so called mixed cases like
> "1mism_3mism_attached".
> I don't know why I've this plane...I don't know why some hits are
> correctly detected and some other continue to be missing. I should use
> BLAT (I hope) for a new genome. I need to be sure that my analisys is ok, in
> order to not miss important relations.
> Maybe is important to know that before blat I've used for the database
> faToNib withe the -hardMask parameter.
> I've red the previous mails you sand me, and I've thougth that could
> be  a problem related to the repMask. I've tried to increase the
> repMask value...for example -repMask= 10000000000, but with no lucky (nothing changed!).
> Can you help me or do you think that BLAT is not able to performe an
> analysis with short sequences? I would like to use BLAt because is
> really speed, but if the results is not really ok with small
> sequences.... I've to take apart this idea. Can you help me?
>
> Thanks a lot for your time.
> Regards
>
> Viviana
>
>
> --------------------------------------------------------------------------
> Viviana Piccolo
> Bioinformatics, Evolution And Comparative Genomics group (BEACON)
> Department of Biomolecular Sciences and Biotechnology
> University of Milan
> Via Celoria, 26 - 20133 Milan, Italy
>
> Tel. (+39) 02 50314916
> e-mail:  viviana.piccolo at unimi.it
> http://www.bsb.unimi.it/bioinformatics.htm
> --------------------------------------------------------------------------
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list