[Genome] BLAT gives wrong tStarts for translated searches.
Galt Barber
galt at soe.ucsc.edu
Mon May 7 11:21:56 PDT 2007
Hi, Kevin
http://hgwdev.cse.ucsc.edu/goldenPath/help/hgTracksHelp.html#PSL
Read through, paying particular attention to the bits about
"negative strand" and "minus strand".
Here are a couple of quotes that may be useful:
"Be aware that the coordinates for a negative strand in a PSL line are
handled in a special way. In the qStart and qEnd fields, the coordinates
indicate the position where the query matches from the point of view of
the forward strand, even when the match is on the reverse strand. However,
in the qStarts list, the coordinates are reversed."
"Essentially, the minus strand blockSizes and qStarts are what you would
get if you reverse-complemented the query. However, the qStart and qEnd
are not reversed. To convert one to the other:
qStart = qSize - revQEnd
qEnd = qSize - revQStart
"
Good Luck!
-Galt
On Fri, 4 May 2007, Kevin M. Carr wrote:
> Hello all,
>
> I am attempting to align RefSeq mRNA sequences from related species to an
> in-house draft genome assembly. I am doing the alignment in protein space
> with the -t=dnax and -q=rnax. While the results for alignments to the +
> strand appear normal I am getting incorrect (or at least confusing) output
> for alignments to the - strand of the target. More specifically the
> individual tStarts locations do not fall within the T start and T end
> positions given. Here is and example:
>
> psLayout version 3
>
> match mis- rep. N's Q gap Q gap T gap T gap strand Q
> Q Q Q T T T T block
> blockSizes qStarts tStarts
> match match count bases count bases name
> size start end name size start end count
> ----------------------------------------------------------------------------
> ----------------------------------------------------------------------------
> -------
> 442 122 0 0 3 951 3 846 +-
> SINFRUT00000182842 1599 78 1593 stig_26 225935 221426
> 222836 4 180,228,36,120, 78,288,552,1473,
> 3099,3330,3585,4389,
>
> As can be seen, the T start is 221426 and T end is 222836 but the tStarts
> are given as 3099,3330,3585,4389
>
> It seems that every alignment to the - strand is incorrect while every one
> to the + strand seems fine. I have run this using command line BLAT (32X1)
> and gfServer/gfClient (33X7) (gfServer started with -trans and gfClient run
> with -q=rnax) and the results are the same either way.
>
> Am I missing something?
>
> Insights and suggestions greatly appreciated.
>
> Kevin M. Carr
>
> **************************
> Bioinformatics Specialist
> Research Technology
> Support Facility
> 202-D Biochemistry Bldg.
> Michigan State University
> East Lansing, MI 48824
>
> Ph: (517) 353-6794
> Fax:(517) 353-8638
> **************************
>
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list