[Genome] difference of local Blat and web-Blat
Xianjun Dong
Xianjun.Dong at bccs.uib.no
Wed Nov 21 01:35:01 PST 2007
Good evening Hiram :)
The number is 34, same as the first line when I just type blat on
command line.
Which version are you using?
Thanks for help
Xianjun
Hiram Clawson wrote:
> Good Evening Xianjun:
>
> Can you verify your version of the blat source you have ?
> There is a file in the source tree: blat/version.doc
> which should have some comments at the top of the
> file indicating the changes in each version. What
> is the first number in that version.doc file ?
>
> --Hiram
>
> Dong Xianjun wrote:
>> hi, Archana
>>
>> Thanks for your quick response.
>>
>> And also for your option on the evolutionary distance alignment, and
>> for the output format correction.
>>
>> But, I am still wondering the difference of my output. Let's keep
>> the evolutionary distance in mind first, and take the query just as a
>> query sequence (it does have some reasonable hits on zebrafish
>> genome, looking from the web-based Blat output). OK, for the
>> following query
>> >hsa-mir-137 MI0000454
>> GGTCCTCTGACTCTCTTCGGTGACGGGTATTCTTGGGTGGATAATACGGATTACGTTGTTATTGCTTAAGAATACGCGTAGTCGAGGAGAGTACCAGCGGCA
>>
>> >hsa-mir-155 MI0000681
>> CTGTTAATGCTAATCGTGATAGGGGTTTTTGCCTCCAACTGACTCCTACATATTAGCATTAACAG
>>
>> My local Blat output is
>> [xianjund at shire scripts]$ blat
>> /export/data/goldenpath/danRer5/assembly.2bit test.fa testdr5.psl
>> -stepSize=5 -minScore=0 -mi
>> nIdentity=0
>> Loaded 1440582308 letters in 5036 sequences
>> Searched 233 bases in 2 sequences
>> [xianjund at shire scripts]$ less testdr5.psl
>> psLayout version 3
>>
>> match mis- rep. N's Q gap Q gap T gap T gap
>> strand Q Q Q Q T
>> T T T block blockSizes qStarts tStarts
>> match match count bases count
>> bases name size start end name
>> size start end count
>> ----------------------------------------------------------------------------------------------------------------------------
>>
>> -----------------------------------
>> 81 3 0 0 0 0 0 0
>> + hsa-mir-137 102 8 92 chr24 4029
>> 3347 28811093 28811177 1 84, 8,
>> 28811093,
>> 83 4 0 0 0 0 1 2
>> + hsa-mir-137 102 5 92 chr2 5436
>> 6722 15780389 15780478 2 5,82, 5,10,
>> 15780389,15780396,
>> 81 3 0 0 0 0 0 0
>> - hsa-mir-137 102 8 92 chr24 4029
>> 3347 29270666 29270750 1 84, 10,
>> 29270666,
>> 16 0 0 0 0 0 0 0
>> - hsa-mir-137 102 58 74 chr17 5231
>> 0423 12235065 12235081 1 16, 28,
>> 12235065,
>> 16 0 0 0 0 0 0 0
>> + hsa-mir-155 131 0 16 chr21 4605
>> 7314 13200400 13200416 1 16, 0,
>> 13200400,
>> 16 0 0 0 0 0 0 0
>> + hsa-mir-155 131 26 42 chr13 5354
>> 7397 37813080 37813096 1 16, 26,
>> 37813080,
>> 25 0 0 0 0 0 0 0
>> - hsa-mir-155 131 3 28 chr1 5620
>> 4684 22923406 22923431 1 25, 103,
>> 22923406,
>> testdr5.psl (END)
>>
>> Where there are 4 hits for the first query (named "hsa-mir-137"), and
>> 3 for the second query. While on the website, it returns
>>
>> ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO
>> STRAND START END SPAN
>> ---------------------------------------------------------------------------------------------------
>>
>> browser
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr24:29270667-29270750&db=danRer5&ss=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+../trash/hgSs/hgSs_genome_332_38c6b0.fa&hgsid=99955646>
>> details
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=29270666&g=htcUserAli&i=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_332_38c6b0.fa+hsa-mir-137&c=chr24&l=29270666&r=29270750&db=danRer5&hgsid=99955646>
>> hsa-mir-137 78 9 92 102 96.5% 24 - 29270667
>> 29270750 84
>> browser
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr24:28811094-28811177&db=danRer5&ss=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+../trash/hgSs/hgSs_genome_332_38c6b0.fa&hgsid=99955646>
>> details
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=28811093&g=htcUserAli&i=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_332_38c6b0.fa+hsa-mir-137&c=chr24&l=28811093&r=28811177&db=danRer5&hgsid=99955646>
>> hsa-mir-137 78 9 92 102 96.5% 24 + 28811094
>> 28811177 84
>> browser
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr2:15780390-15780478&db=danRer5&ss=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+../trash/hgSs/hgSs_genome_332_38c6b0.fa&hgsid=99955646>
>> details
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=15780389&g=htcUserAli&i=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_332_38c6b0.fa+hsa-mir-137&c=chr2&l=15780389&r=15780478&db=danRer5&hgsid=99955646>
>> hsa-mir-137 78 6 92 102 95.5% 2 + 15780390
>> 15780478 89
>> browser
>> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr1:22923407-22923431&db=danRer5&ss=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+../trash/hgSs/hgSs_genome_332_38c6b0.fa&hgsid=99955646>
>> details
>> <http://genome.ucsc.edu/cgi-bin/hgc?o=22923406&g=htcUserAli&i=../trash/hgSs/hgSs_genome_332_38c6b0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_332_38c6b0.fa+hsa-mir-155&c=chr1&l=22923406&r=22923431&db=danRer5&hgsid=99955646>
>> hsa-mir-155 25 4 28 65 100.0% 1 - 22923407
>> 22923431 25
>>
>>
>> I am still confusing for the difference. Can you paste your output here?
>>
>> Looking forwards to your help!
>>
>> Thanks
>>
>> Xianjun
>>
>>
>>
>> Archana Thakkapallayil wrote:
>>> Hello Xianjun,
>>>
>>> One of our developers suggested the following:
>>>
>>> 1. It is NOT recommended to use BLAT across such a large
>>> evolutionary distance (human micro-rna to zebrafish genome). Even
>>> if one were to attempt large distances, protein BLAT would be more
>>> sensitive -- although you are doing RNA, so it may have different
>>> constraints.
>>>
>>> blastz from Webb Miller at Pennsylvania State University is what we
>>> use for cross-species alignments, especially for distances further
>>> apart than human-mouse.
>>>
>>> 2. If you are going to use blast8 output, you don't have a .psl and
>>> shouldn't use that extenison. Also blast8 is not going to give
>>> exactly the same output as psl. For example, psl is a
>>> linked-features format where the exons have been chained together.
>>> Blast output is single-exon.
>>>
>>> 3. Using the following command, I was able to exactly reproduce the
>>> same results that hgBlat gives on our site for danRer5 BLAT :
>>>
>>> blat /gbdb/danRer5/danRer5.2bit test.fa testdr5.psl -stepSize=5
>>> -minScore=0 -minIdentity=0
>>>
>>> This shows that command-line blat and hgBlat are working.
>>>
>>> I hope this information is helpful to you. Please don't hesitate to
>>> contact us again if you require further assistance.
>>>
>>> Regards,
>>>
>>> Archana
>>> UCSC Genome Bioinformatics Group
>>>
>>>
>>>> Subject:
>>>> difference of local Blat and web-Blat
>>>> From:
>>>> Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>>> Date:
>>>> Tue, 20 Nov 2007 16:53:37 +0100
>>>> To:
>>>> genome at soe.ucsc.edu
>>>>
>>>> To:
>>>> genome at soe.ucsc.edu
>>>>
>>>>
>>>> Dear sir./madam,
>>>>
>>>> I am trying to use local Blat for my miRNA(~70bp in length)
>>>> sequences, with the following parameters
>>>>
>>>> blat /export/data/goldenpath/danRer5/assembly.2bit test.fa
>>>> testdr5.psl -out=blast8 -stepSize=5 -minScore=0 -minIdentity=0
>>>>
>>>> as the FAQ page (http://www.genome.ucsc.edu/FAQ/FAQblat.html#blat5)
>>>> said.
>>>>
>>>> But the output is still different from that of the web-based Blat,
>>>> which are
>>>>
>>>> ================= local Blat =================
>>>> hsa-mir-137 chr24 96.43 84 3 0 9
>>>> 92 35690482 35690399 3.4e-37 153.0
>>>> hsa-mir-137 chr17 100.00 16 0 0 59
>>>> 74 14974006 14973991 2.1e+00 30.0
>>>>
>>>> ================= web-based Blat =============
>>>> ACTIONS QUERY SCORE START END QSIZE IDENTITY
>>>> CHRO STRAND START END SPAN
>>>> ---------------------------------------------------------------------------------------------------
>>>>
>>>> browser
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgTracks?position=chr24:29270667-29270750&db=danRer5&ss=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+../trash/hgSs/hgSs_www_7e6c_301a00.fa&hgsid=99930361>
>>>> details
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgc?o=29270666&g=htcUserAli&i=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+..%2Ftrash%2FhgSs%2FhgSs_www_7e6c_301a00.fa+hsa-mir-137&c=chr24&l=29270666&r=29270750&db=danRer5&hgsid=99930361>
>>>> hsa-mir-137 78 9 92 102 96.5% 24 - 29270667
>>>> 29270750 84
>>>> browser
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgTracks?position=chr24:28811094-28811177&db=danRer5&ss=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+../trash/hgSs/hgSs_www_7e6c_301a00.fa&hgsid=99930361>
>>>> details
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgc?o=28811093&g=htcUserAli&i=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+..%2Ftrash%2FhgSs%2FhgSs_www_7e6c_301a00.fa+hsa-mir-137&c=chr24&l=28811093&r=28811177&db=danRer5&hgsid=99930361>
>>>> hsa-mir-137 78 9 92 102 96.5% 24 + 28811094
>>>> 28811177 84
>>>> browser
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgTracks?position=chr2:15780390-15780478&db=danRer5&ss=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+../trash/hgSs/hgSs_www_7e6c_301a00.fa&hgsid=99930361>
>>>> details
>>>> <http://www.genome.ucsc.edu/cgi-bin/hgc?o=15780389&g=htcUserAli&i=../trash/hgSs/hgSs_www_7e6c_301a00.pslx+..%2Ftrash%2FhgSs%2FhgSs_www_7e6c_301a00.fa+hsa-mir-137&c=chr2&l=15780389&r=15780478&db=danRer5&hgsid=99930361>
>>>> hsa-mir-137 78 6 92 102 95.5% 2 + 15780390
>>>> 15780478 89
>>>>
>>>> You see the result are much different, where the web-based one is
>>>> more reasonable.
>>>> My query sequence is fairly simple:
>>>>> hsa-mir-137 MI0000454
>>>> GGTCCTCTGACTCTCTTCGGTGACGGGTATTCTTGGGTGGATAATACGGATTACGTTGTTATTGCTTAAGAATACGCGTAGTCGAGGAGAGTACCAGCGGCA
>>>>
>>>>
>>>> The local Blat version is BLAT v. 34.
>>>>
>>>> Could you help to have a look? Thanks very much.
>>>>
>>>> Regards,
>>>>
>>>> Xianjun
>>>>
>> _______________________________________________
>> Genome maillist - Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>
--
---------------------------
Sterding (Xianjun) Dong
PhD student, Boris Lenhard's group
Bergen Center of Computational Science
Bergen University, Norway
Mobile: 0047-47361688
Telephone: 0047-55276381
Skype: xianjun.dong
More information about the Genome
mailing list