[Genome] difference of local Blat and web-Blat

Xianjun Dong Xianjun.Dong at bccs.uib.no
Wed Nov 21 01:35:01 PST 2007


Good evening Hiram :)

The number is 34, same as the first line when I just type blat on 
command line.

Which version are you using?

Thanks for help

Xianjun

Hiram Clawson wrote:
> Good Evening Xianjun:
>
> Can you verify your version of the blat source you have ?
> There is a file in the source tree: blat/version.doc
> which should have some comments at the top of the
> file indicating the changes in each version.  What
> is the first number in that version.doc file ?
>
> --Hiram
>
> Dong Xianjun wrote:
>> hi, Archana
>>
>> Thanks for your quick response.
>>
>> And also for your option on the evolutionary distance alignment, and 
>> for the output format correction.
>>
>> But,  I am still wondering the difference of my output. Let's keep 
>> the evolutionary distance in mind first, and take the query just as a 
>> query sequence (it does have some reasonable hits on zebrafish 
>> genome, looking from the web-based Blat output). OK, for the 
>> following query
>>  >hsa-mir-137 MI0000454
>> GGTCCTCTGACTCTCTTCGGTGACGGGTATTCTTGGGTGGATAATACGGATTACGTTGTTATTGCTTAAGAATACGCGTAGTCGAGGAGAGTACCAGCGGCA 
>>
>>  >hsa-mir-155 MI0000681
>> CTGTTAATGCTAATCGTGATAGGGGTTTTTGCCTCCAACTGACTCCTACATATTAGCATTAACAG
>>
>> My local Blat output is
>> [xianjund at shire scripts]$ blat 
>> /export/data/goldenpath/danRer5/assembly.2bit test.fa testdr5.psl 
>> -stepSize=5 -minScore=0 -mi
>> nIdentity=0
>> Loaded 1440582308 letters in 5036 sequences
>> Searched 233 bases in 2 sequences
>> [xianjund at shire scripts]$ less testdr5.psl
>> psLayout version 3
>>
>> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   
>> strand  Q               Q       Q       Q       T              
>> T       T       T       block   blockSizes      qStarts  tStarts
>>         match   match           count   bases   count   
>> bases           name            size    start   end     name    
>>        size    start   end     count
>> ---------------------------------------------------------------------------------------------------------------------------- 
>>
>> -----------------------------------
>> 81      3       0       0       0       0       0       0       
>> +       hsa-mir-137     102     8       92      chr24   4029
>> 3347        28811093        28811177        1       84,     8,      
>> 28811093,
>> 83      4       0       0       0       0       1       2       
>> +       hsa-mir-137     102     5       92      chr2    5436
>> 6722        15780389        15780478        2       5,82,   5,10,   
>> 15780389,15780396,
>> 81      3       0       0       0       0       0       0       
>> -       hsa-mir-137     102     8       92      chr24   4029
>> 3347        29270666        29270750        1       84,     10,     
>> 29270666,
>> 16      0       0       0       0       0       0       0       
>> -       hsa-mir-137     102     58      74      chr17   5231
>> 0423        12235065        12235081        1       16,     28,     
>> 12235065,
>> 16      0       0       0       0       0       0       0       
>> +       hsa-mir-155     131     0       16      chr21   4605
>> 7314        13200400        13200416        1       16,     0,      
>> 13200400,
>> 16      0       0       0       0       0       0       0       
>> +       hsa-mir-155     131     26      42      chr13   5354
>> 7397        37813080        37813096        1       16,     26,     
>> 37813080,
>> 25      0       0       0       0       0       0       0       
>> -       hsa-mir-155     131     3       28      chr1    5620
>> 4684        22923406        22923431        1       25,     103,    
>> 22923406,
>> testdr5.psl (END)
>>
>> Where there are 4 hits for the first query (named "hsa-mir-137"), and 
>> 3 for the second query. While on the website, it returns
>>
>>    ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO 
>> STRAND  START    END      SPAN
>> --------------------------------------------------------------------------------------------------- 
>>
>> browser 
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr24:29270667-29270750&db=danRer5&ss=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+../trash/hgSs/hgSs_genome_332_38c6b0.fa&hgsid=99955646> 
>> details 
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=29270666&g=htcUserAli&i=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_332_38c6b0.fa+hsa-mir-137&c=chr24&l=29270666&r=29270750&db=danRer5&hgsid=99955646> 
>> hsa-mir-137       78     9    92   102  96.5%    24   -   29270667  
>> 29270750     84
>> browser 
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr24:28811094-28811177&db=danRer5&ss=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+../trash/hgSs/hgSs_genome_332_38c6b0.fa&hgsid=99955646> 
>> details 
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=28811093&g=htcUserAli&i=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_332_38c6b0.fa+hsa-mir-137&c=chr24&l=28811093&r=28811177&db=danRer5&hgsid=99955646> 
>> hsa-mir-137       78     9    92   102  96.5%    24   +   28811094  
>> 28811177     84
>> browser 
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr2:15780390-15780478&db=danRer5&ss=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+../trash/hgSs/hgSs_genome_332_38c6b0.fa&hgsid=99955646> 
>> details 
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=15780389&g=htcUserAli&i=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_332_38c6b0.fa+hsa-mir-137&c=chr2&l=15780389&r=15780478&db=danRer5&hgsid=99955646> 
>> hsa-mir-137       78     6    92   102  95.5%     2   +   15780390  
>> 15780478     89
>> browser 
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr1:22923407-22923431&db=danRer5&ss=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+../trash/hgSs/hgSs_genome_332_38c6b0.fa&hgsid=99955646> 
>> details 
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=22923406&g=htcUserAli&i=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_332_38c6b0.fa+hsa-mir-155&c=chr1&l=22923406&r=22923431&db=danRer5&hgsid=99955646> 
>> hsa-mir-155       25     4    28    65 100.0%     1   -   22923407  
>> 22923431     25
>>
>>
>> I am still confusing for the difference. Can you paste your output here?
>>
>> Looking forwards to your help!
>>
>> Thanks
>>
>> Xianjun
>>
>>
>>
>> Archana Thakkapallayil wrote:
>>> Hello Xianjun,
>>>
>>> One of our developers suggested the following:
>>>
>>> 1. It is NOT recommended to use BLAT across such a large 
>>> evolutionary distance (human micro-rna to zebrafish genome).  Even 
>>> if one were to attempt large distances, protein BLAT would be more 
>>> sensitive -- although you are doing RNA, so it may have different 
>>> constraints.
>>>
>>> blastz from Webb Miller at Pennsylvania State University is what we 
>>> use for cross-species alignments, especially for distances further 
>>> apart than human-mouse.
>>>
>>> 2. If you are going to use blast8 output, you don't have a .psl and 
>>> shouldn't use that extenison. Also blast8 is not going to give 
>>> exactly the same output as psl.  For example, psl is a 
>>> linked-features format where the exons have been chained together.  
>>> Blast output is single-exon.
>>>
>>> 3. Using the following command, I was able to exactly reproduce the 
>>> same results that hgBlat gives on our site for danRer5 BLAT :
>>>
>>> blat /gbdb/danRer5/danRer5.2bit test.fa testdr5.psl -stepSize=5 
>>> -minScore=0 -minIdentity=0
>>>
>>> This shows that command-line blat and hgBlat are working.
>>>
>>> I hope this information is helpful to you. Please don't hesitate to 
>>> contact us again if you require further assistance.
>>>
>>> Regards,
>>>
>>> Archana
>>> UCSC Genome Bioinformatics Group
>>>
>>>
>>>> Subject:
>>>> difference of local Blat and web-Blat
>>>> From:
>>>> Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>>> Date:
>>>> Tue, 20 Nov 2007 16:53:37 +0100
>>>> To:
>>>> genome at soe.ucsc.edu
>>>>
>>>> To:
>>>> genome at soe.ucsc.edu
>>>>
>>>>
>>>> Dear sir./madam,
>>>>
>>>> I am trying to use local Blat for my miRNA(~70bp in length) 
>>>> sequences, with the following parameters
>>>>
>>>> blat /export/data/goldenpath/danRer5/assembly.2bit test.fa 
>>>> testdr5.psl -out=blast8 -stepSize=5 -minScore=0 -minIdentity=0
>>>>
>>>> as the FAQ page (http://www.genome.ucsc.edu/FAQ/FAQblat.html#blat5) 
>>>> said.
>>>>
>>>> But the output is still different from that of the web-based Blat, 
>>>> which are
>>>>
>>>> ================= local Blat =================
>>>> hsa-mir-137     chr24   96.43   84      3       0       9       
>>>> 92      35690482        35690399        3.4e-37 153.0
>>>> hsa-mir-137     chr17   100.00  16      0       0       59      
>>>> 74      14974006        14973991        2.1e+00 30.0
>>>>
>>>> ================= web-based Blat =============
>>>>    ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY 
>>>> CHRO STRAND  START    END      SPAN
>>>> --------------------------------------------------------------------------------------------------- 
>>>>
>>>> browser 
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgTracks?position=chr24:29270667-29270750&db=danRer5&ss=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+../trash/hgSs/hgSs_www_7e6c_301a00.fa&hgsid=99930361> 
>>>> details 
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgc?o=29270666&g=htcUserAli&i=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+..%2Ftrash%2FhgSs%2FhgSs_www_7e6c_301a00.fa+hsa-mir-137&c=chr24&l=29270666&r=29270750&db=danRer5&hgsid=99930361> 
>>>> hsa-mir-137       78     9    92   102  96.5%    24   -   29270667  
>>>> 29270750     84
>>>> browser 
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgTracks?position=chr24:28811094-28811177&db=danRer5&ss=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+../trash/hgSs/hgSs_www_7e6c_301a00.fa&hgsid=99930361> 
>>>> details 
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgc?o=28811093&g=htcUserAli&i=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+..%2Ftrash%2FhgSs%2FhgSs_www_7e6c_301a00.fa+hsa-mir-137&c=chr24&l=28811093&r=28811177&db=danRer5&hgsid=99930361> 
>>>> hsa-mir-137       78     9    92   102  96.5%    24   +   28811094  
>>>> 28811177     84
>>>> browser 
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgTracks?position=chr2:15780390-15780478&db=danRer5&ss=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+../trash/hgSs/hgSs_www_7e6c_301a00.fa&hgsid=99930361> 
>>>> details 
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgc?o=15780389&g=htcUserAli&i=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+..%2Ftrash%2FhgSs%2FhgSs_www_7e6c_301a00.fa+hsa-mir-137&c=chr2&l=15780389&r=15780478&db=danRer5&hgsid=99930361> 
>>>> hsa-mir-137       78     6    92   102  95.5%     2   +   15780390  
>>>> 15780478     89
>>>>
>>>> You see the result are much different, where the web-based one is 
>>>> more reasonable.
>>>> My query sequence is fairly simple:
>>>>> hsa-mir-137 MI0000454
>>>> GGTCCTCTGACTCTCTTCGGTGACGGGTATTCTTGGGTGGATAATACGGATTACGTTGTTATTGCTTAAGAATACGCGTAGTCGAGGAGAGTACCAGCGGCA 
>>>>
>>>>
>>>> The local Blat version is BLAT v. 34.
>>>>
>>>> Could you help to have a look? Thanks very much.
>>>>
>>>> Regards,
>>>>
>>>> Xianjun
>>>>
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>

-- 
---------------------------
Sterding (Xianjun) Dong
PhD student, Boris Lenhard's group
Bergen Center of Computational Science
Bergen University, Norway
Mobile: 0047-47361688
Telephone: 0047-55276381
Skype: xianjun.dong



More information about the Genome mailing list