[Genome] xenoRNA download from Orangutan database
Galt Barber
galt at soe.ucsc.edu
Tue May 13 12:39:34 PDT 2008
Adding a little to this thread...
The psl strands are reported as
-
+
++
+-
-+
--
- is just short-hand for -+
+ is just short-hand for ++
The first column is the query strand,
the second column is the target strand.
When blat does a regular mRNA alignment
to DNA, it only keeps tiles of the positive
strand, so it runs the query, and then
it reverse-complements the query and runs
it again. Therefore you only ever see
+ and - (really ++ and -+),
never +- or -- (unless other processing).
And the query has been reverse-complemented
automatically and searched against the positive tiles.
For nucleotide search, this is mathematically
equivalent to leaving the query alone and
searching neg strand tiles.
There is code to handle a flip-over,
if you want to reverse a psl's strands, you
flip both strands' signs over to their opposite,
e.g. you can flip -+ to +-, etc. BLAST by the
way will do this automatically so that
mRNA queries come out +- instead of -+. BLAT does not.
Note that a flip-over will also reverse
the order of the blocks.
If you see translated blat psls,
you can see some +- and -- queries
and blat has to store
amino acid tiles explicitly for both strands.
It does not need to reverse-complement the query unless
you are uncertain about whether your query
is on the coding strand, and therefore ask blat to do
the query rev-comp too. So translated would be only
++ and +- unless you tell it to do revcomp too.
In either case, for any - strand for either query or target,
the xStart and xEnd are always given in positive strand coordinates.
But the xStarts are left in negative strand coordinates.
For some processes, we may use some other alignment program
and then convert the output to psl format for
further processing.
-Galt
On Tue, 13 May 2008, Brooke Rhead wrote:
> Hi Jason,
>
> The xenoRNA track is in PSL format, described here:
>
> http://genome.ucsc.edu/FAQ/FAQformat#format2
>
> Within the description on this page is this crucial bit of information:
>
> ~~~
> Be aware that the coordinates for a negative strand in a PSL line are
> handled in a special way. In the qStart and qEnd fields, the coordinates
> indicate the position where the query matches from the point of view of
> the forward strand, even when the match is on the reverse strand.
> However, in the qStarts list, the coordinates are reversed.
> ~~~
>
> Please feel free to write back to genome at soe.ucsc.edu if you have
> further questions.
>
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
>
>
> Blythe, Jason wrote:
>> Hello-
>>
>> I have a question about a few fields that are seen in the Orangutan
> xenoRNA track download from the browser. The fields in question are
> tStart,tEnd,and tStarts. I have only checked this for a few of our
> results, but if one lies on the negative strand ('+-') the tStarts
> listed do not fall within the parameters of tStart and tEnd(Ex. BC042829
> chr7:84009419-84017370). However the ones listed on the positive stand
> ('++') are all good. Is there something that I am missing if there is an
> RNA that is on the negative strand that I should be accounting for? Thanks.
>>
>>
>>
>> jason
>>
>>
>>
>> _______________________________________________
>> Genome maillist - Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list