[Genome] parameters for blat with 25mers, -fastMap

Yueming Ding yueming.ding at jax.org
Wed May 16 13:01:35 PDT 2007


I tried with -minScore=20. I still didn't get any output. Thanks.

Yueming

-----Original Message-----
From: Jim Kent [mailto:kent at soe.ucsc.edu] 
Sent: Tuesday, May 15, 2007 8:37 PM
To: Galt Barber
Cc: Yueming Ding; genome at soe.ucsc.edu
Subject: Re: [Genome] parameters for blat with 25mers, -fastMap

I wonder if adding -minScore=20 would help though.....

On May 14, 2007, at 10:17 AM, Galt Barber wrote:

>
> This is a bug in -fastMap.  I recommend not using
> it for short exact sequences.
>
> -Galt
>
>
> On Mon, 14 May 2007, Yueming Ding wrote:
>
>> Hi Galt,
>>
>> Thanks for your help. I am now trying to blat with 25mers which
>> are from mouse genomic sequences. I tried used -fastMap since I
>> don't want intron there. The other parameters I used are:
>>
>> -tileSize=8 -stepSize=5 -noTrimA -fastMap
>>
>> But I didn't get any outputs (I blatted with 346903 25mers). Could
>> you please tell me why I didn't get any hits when I use -fastMap
>> parameters? Thanks.
>>
>> Yueming
>>
>>
>>
>>
>> -----Original Message-----
>> From: Galt Barber [mailto:galt at soe.ucsc.edu]
>> Sent: Thursday, May 10, 2007 5:30 PM
>> To: Yueming Ding
>> Cc: Galt Barber; Kayla Smith; genome at soe.ucsc.edu
>> Subject: RE: [Genome] parameters for blat with 25mers
>>
>>
>> usually 2*stepSize + (tileSize - 1) == min size guaranteed hit.
>>
>> Even without any .ooc filtering out overused tiles,
>> blat has some built-in filtering-out of repetitive
>> short monomer and dimers.
>>
>> Not all the alignments will be useful, and some 25mers
>> might legitimately map to multiple locations.
>> For filtering, use pslReps and pslCDnaFilter.
>> You could filter it to only show perfect 25-mer matches
>> if that's what you want.
>>
>> There is also a findMotif program for short exact matches
>> but it only finds one motif at a time:
>>
>> kent/src/utils/findMotif
>>
>> [hgwdev:findMotif> findMotif
>> findMotif - find specified motif in sequence
>> usage:
>>    findMotif [options] -motif=<acgt...> sequence
>> where:
>>    sequence is a .fa , .nib or .2bit file or a file which is a
>> list of
>> sequence files.
>> options:
>>    -motif=<acgt...> - search for this specified motif (case ignored,
>> [acgt] only)
>>    -chr=<chrN> - process only this one chrN from the sequence
>>    -strand=<+|-> - limit to only one strand.  Default is both.
>>    -bedOutput - output bed format (this is the default)
>>    -wigOutput - output wiggle data format instead of bed file
>>    -verbose=N - set information level [1-4]
>>    NOTE: motif must be longer than 4 characters, less than 17
>>    -verbose=4 - will display gaps as bed file data lines to stderr
>>
>>
>> -Galt
>>
>>
>> On Thu, 10 May 2007, Yueming Ding wrote:
>>
>>> Galt,
>>>
>>> I am not sure what your dna-space and mrna really mean. But
>>> anyway my 25mers are from mouse genomic sequences. I am trying
>>> different numbers of stepSize and tileSize, e.g. stepSize=8
>>> tileSize=8, or stepSize=5 tileSize=10, or stepSize=5 tileSize=11,
>>> stepSize=5 tileSize=8. All of these had given me different number
>>> of hits. I don't know which numbers of stepSize and tileSize are
>>> the best for my 25mers. Are there any rules to select stepSize
>>> and tileSize? Thanks.
>>>
>>> I am using command-line stand-alone:
>>> Blat database query.fa output.psl stepSize=5 tileSize=8
>>>
>>> Yueming
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Galt Barber [mailto:galt at soe.ucsc.edu]
>>> Sent: Thursday, May 10, 2007 3:33 PM
>>> To: Yueming Ding
>>> Cc: Kayla Smith; genome at soe.ucsc.edu
>>> Subject: Re: [Genome] parameters for blat with 25mers
>>>
>>>
>>> Are your 25mers from dna-space (intronless) or mrna (may have
>>> introns)?
>>>
>>> If the former "gapless" case, then our hgBlat parameters will work
>>> (e.g. stepSize=5 tileSize=11) with 25mers, as long as they do not
>>> lie
>>> on over-used tiles.
>>>
>>> If you are using command-line stand-alone blat,
>>> what does your command-line look like?
>>>
>>> -Galt
>>>
>>>
>>> On Thu, 10 May 2007, Yueming Ding wrote:
>>>
>>>> Hi Kayla,
>>>>
>>>> I need your help on another problem. I am using regular BLAT
>>>> with 25mers to
>>>> scan mouse genome (use default parameters). I get hits for only
>>>> half of
>>>> the query sequences. Could you please tell me how I can set some
>>>> parameters
>>>> so that I can blat with 25mers? Thanks.
>>>>
>>>> Yueming
>>>>
>>>> -----Original Message-----
>>>> From: Kayla Smith [mailto:kayla at soe.ucsc.edu]
>>>> Sent: Tuesday, May 08, 2007 5:16 PM
>>>> To: Yueming Ding
>>>> Cc: genome at soe.ucsc.edu
>>>> Subject: Re: [Genome] blat
>>>>
>>>>
>>>> Yueming,
>>>>
>>>> I've asked one of our developers about your question and here is
>>>> what he
>>>> has to say:
>>>>
>>>> Please see "Replicating web-based Blat percent identity and score
>>>> calculations" http://genome.ucsc.edu/FAQ/FAQblat.html#blat4
>>>> which has all details needed to calculate our hgBlat score.
>>>>
>>>> Indels and mismatches are not treated the same,
>>>> that includes how BLAT does alignments and
>>>> how it calculates the final score.
>>>>
>>>> BLAT builds the exons as alignments with
>>>> matches/mismatches extending from the seed
>>>> position until the alignment cannot be extended.
>>>>
>>>> Then the parts are chained together giving
>>>> a final alignment that has exons and introns.  All the
>>>> details of the score calculation are given above.
>>>>
>>>> In general huge introns do not carry a huge
>>>> penalty.  It's not subtracting one for
>>>> each base of the intron gap. It actually only
>>>> subtracts one for the entire gap or insert.
>>>> In general also a mismatch consumes a base from the query
>>>> side whereas a gap does not, e.g.
>>>>
>>>> mismatch T/C (T in query is consumed)
>>>> query:  ACTGACTG
>>>> target: ACCGACTG
>>>>
>>>> gap example (gap on query side, nothing in query is consumed)
>>>> query:  ACT---------GACTG
>>>> target: ACTCGCCGGCCCGACTG
>>>>
>>>> Note on repeatMatches:
>>>> Despite the documentation, the repeatMatches feature
>>>> of the psls is basically not used, so you won't see
>>>> anything in that column.  Instead, a match
>>>> in a repeated area will just be a regular match.
>>>>
>>>> I hope this information is helpful to you.  Please don't
>>>> hesitate to
>>>> contact us again if you require further assistance.
>>>>
>>>> Kayla Smith
>>>> UCSC Genome Bioinformatics Group
>>>>
>>>>
>>>>
>>>> Yueming Ding wrote:
>>>>> Generator Microsoft Word 11 (filtered medium) Hi, is anyone
>>>>> able to tell me how Jim' s BLAT handles mismatch and indels?
>>>>> Does BLAT treat mismatch and indels equally (by assigning the
>>>>> same penalty scores)? Thanks.
>>>>>
>>>>> Yueming Ding
>>>>> The Jackson Laboratory
>>>>> _______________________________________________
>>>>> Genome maillist  -  Genome at soe.ucsc.edu
>>>>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>>>> From - Tue
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Genome maillist  -  Genome at soe.ucsc.edu
>>>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>>>
>>>
>>>
>>
>>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome





More information about the Genome mailing list