[Genome] blat_help
Galt Barber
galt at soe.ucsc.edu
Tue May 29 17:57:41 PDT 2007
For some reason, using -fine helps BLAT pick up a bunch of your queries,
e.g.
>1_mismatch_A
aacgtgttttcgggtgtaggaggatcgc
cacgtgtttttgggtgtaggaggaacgg (genomic for comparison 4 mismatches)
blat -fine -stepSize=1 -tileSize=6 -minMatch=1 -minScore=0 -minIdentity=0
db.fa query.fa out.psl
Even with all these settings making BLAT be at it's most sensitive,
it still misses some.
The answer is that you are just going beyond BLAT's sensitivity.
For instance, we do not use blat for instance in our cross-species chains.
We use blastz. We do use blat for aligning ESTs and mRNA where you have
fairly long stretches of very high identity. We can use it for making
liftOver files between two assemblies in the same species,
e.g. hg17 to hg18.
Many of the examples you give have many mismatches and the
runs of identical bases are very short. If you need blat
to pick up every example that you give, then it's just not
going to work. And those sequences will be hard for any
search engine. Although BLAST is more sensitive, I can't
guarantee that it could find every one of your example queries.
Good luck!
-Galt
> ---------- Forwarded message ----------
> Date: Tue, 29 May 2007 09:03:07 +0200
> From: Viviana Piccolo <viviana.piccolo at unimi.it>
> To: genome at soe.ucsc.edu, genome at soe.ucsc.edu
> Subject: [Genome] blat_help
>
> Hi!
> I've tried to run the blat with this parameters -tileSize=8 -
> stepSize=4 -minMatch=1 -minScore=19 as your developers suggested me, but I
> can't find all hits I want.
> This is my database:
> >db
> catgactattgcactaaaggtgcgcacacgtgtttttgggtgtaggaggaacggacgccacccaccaggtagctatactctcccccaggtattaccattagggagggagaaaaaccatagtcgtaggttgcgtgca
>
> And this is my query:
> >seq_original
> cgtgtttttgggtgtaggagg
> >1_mismatch_A
> aacgtgttttcgggtgtaggaggatcgc
> >1_mismatch_B
> aaaacgtgcttttgggtgtaggaggtaaaa
> >1_mismatch_C
> aaagggggggcgtgattttgggtgtaggaggggtggg
> >2mism_A
> ttaccaggcgtgtatttgggcgtaggacgttttaaccacctggg
> >2mism_B
> aatgtgggggcgtgtttatgggtgtaggaagttttg
> >2mism_far
> aaacgtgtttttgggcgtaggaagttttcaggg
> >3mism
> gcaaaaaaaaacgtgtttatgggtataggaagtttcaaactttttttttttacaaa
> >1_mism_+_1mism_near
> ggggggggggcggtcgtgtttttgggagtcggaggagctatatt
> >1_mism_+_1mism_far_6_nts
> acaccgggggggggcgtgtttttgggagtaggaagttat
> >3mism_attached
> aacaggggggacgtgtttttgaaagtaggaggttatcctt
> >1mism_3mism_attached
> gtgtcccccgtgtctttgggtcccggaggtttt
> >5_mismatch
> atcaaataacgtatctctgggtatgggaggactttt
>
> This is the BLAT output for parameters -tileSize=8 -stepSize=4 -
> minMatch=1 -minScore=19 that you suggested me:
>
>
> psLayout version 3
>
> match mis- rep. N's Q gap Q gap T gap T gap
> strand Q Q Q Q T T
> T T block blockSizes q
> Starts tStarts
> match match count bases count
> bases name size start end name
> size start end count
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> 21 0 0 0 0 0 0 0
> + seq_original 21 0 21 db_con_gap 136
> 28 49 1 21, 0, 2
> 8,
> 21 1 0 0 0 0 0 0
> + 1_mismatch_B 30 3 25 db_con_gap 136
> 27 49 1 22, 3, 2
> 7,
> 20 1 0 0 0 0 0 0
> + 1_mismatch_C 37 10 31 db_con_gap 136
> 28 49 1 21, 10, 2
> 8,
>
>
> I'm looking for hits of minimus 19 nucleotides, but I've tried to low
> the scoreSize to 15 and using the parameters -tileSize=6 -stepSize=5 -
> minScore=15 -minIdentity=20 -minMatch=0 -out=pslx -noHead and I can
> see that I can find more hits, but not all aI want...and I've seen some
> stranges situation. I can show you my Blat output:
>
> psLayout version 3
>
> match mis- rep. N's Q gap Q gap T gap T gap
> strand Q Q Q Q T T
> T T block blockSizes q
> Starts tStarts
> match match count bases count
> bases name size start end name
> size start end count
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> 21 0 0 0 0 0 0 0
> + seq_original 21 0 21 db_con_gap 136
> 28 49 1 21, 0, 2
> 8,
> 21 1 0 0 0 0 0 0
> + 1_mismatch_B 30 3 25 db_con_gap 136
> 27 49 1 22, 3, 2
> 7,
> 20 1 0 0 0 0 0 0
> + 1_mismatch_C 37 10 31 db_con_gap 136
> 28 49 1 21, 10, 2
> 8,
> 18 1 0 0 0 0 0 0
> + 2mism_far_B 36 10 29 db_con_gap 136
> 28 47 1 19, 10, 2
> 8,
> 15 0 0 0 0 0 0 0
> + 2mism_near_A 31 16 31 db_con_gap 136
> 34 49 1 15, 16, 3
> 4,
> 19 2 0 0 0 0 0 0
> + 2mism_near_B 21 0 21 db_con_gap 136
> 28 49 1 21, 0, 2
> 8,
>
> As you see, in the list appear two hits that contains a mismatch
> ("1_mismatch_C" and "1_mismatch_B") but I continue missing the hit
> "1_mismatch_A" that contains a mismatch too.
> I've considered others combinations, for example I've taken into
> account the possibility that two or more mismatches could be near
> (distance between mismatches= 1-2 nt in "2mism_near_A",
> "2mism_near_B") or far (distance between mismatches= 5-6 nt in
> "2mism_far_A", " 2mism_far_B" , "2mism_far_C" ) or the distance between
> mismatches is equal to 0 nt (in and "2mism_attached" and
> "3mism_attached" ).
> I've controlled situations with more than two mismatches (in
> "3mism_far" and "5_mismatch") and in mixed situations (in
> "1mism_3mism_attached"). I can't obtain a BLAT output that respect my
> conditions. I miss the hit 2mism_far_A and 2mism_far_C (while I've the
> 2mism_far_B...why?), I miss " 2mism_near_B" (while I have in the output
> " 2mism_near_A"...). I miss completely all cases of mismatches > of 2
> ("3mism_far" , "5_mismatch") and cases in whitch mismatches have a
> distance of 0 nt (attached mismatches in "2mism_attached",
> "3mism_attached") and the so called mixed cases like
> "1mism_3mism_attached".
> I don't know why I've this plane...I don't know why some hits are
> correctly detected and some other continue to be missing. I should use
> BLAT (I hope) for a new genome. I need to be sure that my analisys is ok, in
> order to not miss important relations.
> Maybe is important to know that before blat I've used for the database
> faToNib withe the -hardMask parameter.
> I've red the previous mails you sand me, and I've thougth that could
> be a problem related to the repMask. I've tried to increase the
> repMask value...for example -repMask= 10000000000, but with no lucky (nothing changed!).
> Can you help me or do you think that BLAT is not able to performe an
> analysis with short sequences? I would like to use BLAt because is
> really speed, but if the results is not really ok with small
> sequences.... I've to take apart this idea. Can you help me?
>
> Thanks a lot for your time.
> Regards
>
> Viviana
>
>
> --------------------------------------------------------------------------
> Viviana Piccolo
> Bioinformatics, Evolution And Comparative Genomics group (BEACON)
> Department of Biomolecular Sciences and Biotechnology
> University of Milan
> Via Celoria, 26 - 20133 Milan, Italy
>
> Tel. (+39) 02 50314916
> e-mail: viviana.piccolo at unimi.it
> http://www.bsb.unimi.it/bioinformatics.htm
> --------------------------------------------------------------------------
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list