[Genome] parameters for blat with 25mers, -fastMap
Galt Barber
galt at soe.ucsc.edu
Mon May 14 10:17:12 PDT 2007
This is a bug in -fastMap. I recommend not using
it for short exact sequences.
-Galt
On Mon, 14 May 2007, Yueming Ding wrote:
> Hi Galt,
>
> Thanks for your help. I am now trying to blat with 25mers which are from mouse genomic sequences. I tried used -fastMap since I don't want intron there. The other parameters I used are:
>
> -tileSize=8 -stepSize=5 -noTrimA -fastMap
>
> But I didn't get any outputs (I blatted with 346903 25mers). Could you please tell me why I didn't get any hits when I use -fastMap parameters? Thanks.
>
> Yueming
>
>
>
>
> -----Original Message-----
> From: Galt Barber [mailto:galt at soe.ucsc.edu]
> Sent: Thursday, May 10, 2007 5:30 PM
> To: Yueming Ding
> Cc: Galt Barber; Kayla Smith; genome at soe.ucsc.edu
> Subject: RE: [Genome] parameters for blat with 25mers
>
>
> usually 2*stepSize + (tileSize - 1) == min size guaranteed hit.
>
> Even without any .ooc filtering out overused tiles,
> blat has some built-in filtering-out of repetitive
> short monomer and dimers.
>
> Not all the alignments will be useful, and some 25mers
> might legitimately map to multiple locations.
> For filtering, use pslReps and pslCDnaFilter.
> You could filter it to only show perfect 25-mer matches
> if that's what you want.
>
> There is also a findMotif program for short exact matches
> but it only finds one motif at a time:
>
> kent/src/utils/findMotif
>
> [hgwdev:findMotif> findMotif
> findMotif - find specified motif in sequence
> usage:
> findMotif [options] -motif=<acgt...> sequence
> where:
> sequence is a .fa , .nib or .2bit file or a file which is a list of
> sequence files.
> options:
> -motif=<acgt...> - search for this specified motif (case ignored,
> [acgt] only)
> -chr=<chrN> - process only this one chrN from the sequence
> -strand=<+|-> - limit to only one strand. Default is both.
> -bedOutput - output bed format (this is the default)
> -wigOutput - output wiggle data format instead of bed file
> -verbose=N - set information level [1-4]
> NOTE: motif must be longer than 4 characters, less than 17
> -verbose=4 - will display gaps as bed file data lines to stderr
>
>
> -Galt
>
>
> On Thu, 10 May 2007, Yueming Ding wrote:
>
> > Galt,
> >
> > I am not sure what your dna-space and mrna really mean. But anyway my 25mers are from mouse genomic sequences. I am trying different numbers of stepSize and tileSize, e.g. stepSize=8 tileSize=8, or stepSize=5 tileSize=10, or stepSize=5 tileSize=11, stepSize=5 tileSize=8. All of these had given me different number of hits. I don't know which numbers of stepSize and tileSize are the best for my 25mers. Are there any rules to select stepSize and tileSize? Thanks.
> >
> > I am using command-line stand-alone:
> > Blat database query.fa output.psl stepSize=5 tileSize=8
> >
> > Yueming
> >
> >
> >
> > -----Original Message-----
> > From: Galt Barber [mailto:galt at soe.ucsc.edu]
> > Sent: Thursday, May 10, 2007 3:33 PM
> > To: Yueming Ding
> > Cc: Kayla Smith; genome at soe.ucsc.edu
> > Subject: Re: [Genome] parameters for blat with 25mers
> >
> >
> > Are your 25mers from dna-space (intronless) or mrna (may have introns)?
> >
> > If the former "gapless" case, then our hgBlat parameters will work
> > (e.g. stepSize=5 tileSize=11) with 25mers, as long as they do not lie
> > on over-used tiles.
> >
> > If you are using command-line stand-alone blat,
> > what does your command-line look like?
> >
> > -Galt
> >
> >
> > On Thu, 10 May 2007, Yueming Ding wrote:
> >
> > > Hi Kayla,
> > >
> > > I need your help on another problem. I am using regular BLAT with 25mers to
> > >scan mouse genome (use default parameters). I get hits for only half of
> > >the query sequences. Could you please tell me how I can set some parameters
> > >so that I can blat with 25mers? Thanks.
> > >
> > > Yueming
> > >
> > > -----Original Message-----
> > > From: Kayla Smith [mailto:kayla at soe.ucsc.edu]
> > > Sent: Tuesday, May 08, 2007 5:16 PM
> > > To: Yueming Ding
> > > Cc: genome at soe.ucsc.edu
> > > Subject: Re: [Genome] blat
> > >
> > >
> > > Yueming,
> > >
> > > I've asked one of our developers about your question and here is what he
> > > has to say:
> > >
> > > Please see "Replicating web-based Blat percent identity and score
> > > calculations" http://genome.ucsc.edu/FAQ/FAQblat.html#blat4
> > > which has all details needed to calculate our hgBlat score.
> > >
> > > Indels and mismatches are not treated the same,
> > > that includes how BLAT does alignments and
> > > how it calculates the final score.
> > >
> > > BLAT builds the exons as alignments with
> > > matches/mismatches extending from the seed
> > > position until the alignment cannot be extended.
> > >
> > > Then the parts are chained together giving
> > > a final alignment that has exons and introns. All the
> > > details of the score calculation are given above.
> > >
> > > In general huge introns do not carry a huge
> > > penalty. It's not subtracting one for
> > > each base of the intron gap. It actually only
> > > subtracts one for the entire gap or insert.
> > > In general also a mismatch consumes a base from the query
> > > side whereas a gap does not, e.g.
> > >
> > > mismatch T/C (T in query is consumed)
> > > query: ACTGACTG
> > > target: ACCGACTG
> > >
> > > gap example (gap on query side, nothing in query is consumed)
> > > query: ACT---------GACTG
> > > target: ACTCGCCGGCCCGACTG
> > >
> > > Note on repeatMatches:
> > > Despite the documentation, the repeatMatches feature
> > > of the psls is basically not used, so you won't see
> > > anything in that column. Instead, a match
> > > in a repeated area will just be a regular match.
> > >
> > > I hope this information is helpful to you. Please don't hesitate to
> > > contact us again if you require further assistance.
> > >
> > > Kayla Smith
> > > UCSC Genome Bioinformatics Group
> > >
> > >
> > >
> > > Yueming Ding wrote:
> > > > Generator Microsoft Word 11 (filtered medium) Hi, is anyone able to tell me how Jim' s BLAT handles mismatch and indels? Does BLAT treat mismatch and indels equally (by assigning the same penalty scores)? Thanks.
> > > >
> > > > Yueming Ding
> > > > The Jackson Laboratory
> > > > _______________________________________________
> > > > Genome maillist - Genome at soe.ucsc.edu
> > > > http://www.soe.ucsc.edu/mailman/listinfo/genome
> > > > From - Tue
> > >
> > >
> > >
> > > _______________________________________________
> > > Genome maillist - Genome at soe.ucsc.edu
> > > http://www.soe.ucsc.edu/mailman/listinfo/genome
> > >
> >
> >
>
>
More information about the Genome
mailing list