[Genome] BLAT Table Function
Ann Zweig
ann at soe.ucsc.edu
Fri May 18 11:25:10 PDT 2007
Hello Carsten,
Because of the way our Table Browser performs intersections, it is not able to
maintain this identifier. Although we are no experts in using Galaxy, we are
quite sure that it is possible to do this using their tools. By the way, Galaxy
maintains a user mail list at: galaxy-user at bx.psu.edu.
My best guess is that the Galaxy tools can't do what you want because of the
structure of the input data. Perhaps if you take the output of the UCSC Browser
which is in this format: 'chrX:nnn-mmm id', and change it into a BED format
(chrX nnn mmm id) it will work there.
Best of luck to you. Please feel free to write back to the list if you have
other Genome Browser questions.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
We invite you to give us your feedback on the UCSC Genome Browser
through May 31, 2007: http://www.surveymonkey.com/s.asp?U=881163743177
Please feel free to search the Genome mailing list archives by visiting
our home page, clicking on "Contact Us", then typing a word or phrase
into the search box. On that same page
(http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome
mailing list.
Carsten Lederer wrote:
> Dear Archana.
>
> Thank you for your swift reply, and for pointing me to Galaxy and its additional functions. Unfortunately, Galaxy does not solve the problem we have, and the shortcoming I am pointing out is so minor that all the functionality is already there in the original BLAT Table Browser:
> We have genome positions that correspond to integration events in the genome. We have given those integration events (and hence positions) ID numbers. We submit these ID numbers together with the regions in the format suggested on the BLAT Browser site:
>
> chrX:151,283,001-151,290,000 optionalRegionName (=ID)
>
> All that is missing is that the ID is then ALSO DISPLAYED in the output table for all the genes found within the corresponding region. This would allow unambiguous association of the original genome position with all corresponding gene hits. Like so:
>
> YourID #hg18.knownGene.name hg18.knownGene.chrom hg18.knownGene.strand hg18.knownGene.txStart hg18.knownGene.txEnd
> ID001 uc001aoh.1 chr1 + 6767970 6854685
> ID001 uc001aoi.1 chr1 + 6767970 7752351
> ID002 uc001beg.1 chr1 - 21140324 21250074
> ID002 uc001bec.1 chr1 - 21005560 21310463
> ID002 uc001beh.1 chr1 - 21140324 21367119
> ID002 uc001bed.1 chr1 - 21005560 21375927
> ID002 uc001bee.1 chr1 - 21005560 21375927
> ID002 uc001bef.1 chr1 - 21005560 21375927
> ID003 uc001bgo.1 chr1 - 23508862 23510973
> ID003 uc001bgp.1 chr1 - 23508862 23517777
>
> Currently the first column is missing, making annotation (using the txStart and txEnd information) cumbersome and possibly error-prone for positions very close to one another.
> The problem would be encountered by anybody using the BLAT Table functions for annotation of genome positions by batch retrieval. If including the ID in the output is not desirable as a default, maybe a checkbox could be included on the submission page ([] include optionalRegionName in output). Naturally, if there is already such a checkbox I would feel rather sheepish, but would be grateful for a pointer.
>
> Looking forward to your feedback and thank you for your time,
> Carsten
>
> Carsten Lederer, Ph.D.
> --------------------------
> Department of Molecular Genetics
> Cyprus Institute of Neurology and Genetics
> 6 International Airport Avenue
> P.O. Box 23462
> 1683 Nicosia
> Cyprus (Europe)
> --------------------------
> Phone +357-22-392657
> Fax +357-22-392615
> --------------------------
> Carsten at inbox.com
> Lederer at cing.ac.cy
>
>
>> -----Original Message-----
>> From: archanat at soe.ucsc.edu
>> Sent: Fri, 11 May 2007 13:00:37 -0700
>> To: carsten at inbox.com
>> Subject: Re: [Genome] BLAT Table Function
>>
>> Hello Carsten,
>>
>> I assume you are trying to use the intersection tool in the Table
>> Browser.
>>
>> However, the Table Browser does not have a "join" function that will
>> give you the name and position of your element to a corresponding
>> element in another track.The results only give the items and their
>> positions from the first table that you selected (not the one that you
>> selected to intersect with) that intersect with the second table, so you
>> are unable to get both sets of information in the output.
>>
>> There is another tool run by Penn State that works in conjunction with
>> the UCSC Genome Browser that can do joins. It is called "Galaxy", and
>> it is located here:
>>
>> http://main.g2.bx.psu.edu/
>>
>> Also, we do provide a link to 'Galaxy' from our Table Browser, by
>> checking the box 'Send output to Galaxy'. This displays results of query
>> in Galaxy, a framework for interactive genome analysis.
>>
>> Galaxy has a tool to "Join the intervals of two queries side-by-side"
>> that you can use.The "join intervals" function is under the "Operate on
>> genomic intervals" section on the left-hand side.
>>
>> I hope this information is helpful to you. If this does not answer your
>> question, please don't hesitate to write back to the list.
>>
>> Regards,
>>
>> Archana
>> UCSC Genome Bioinformatics Group
>>
>> We invite you to give us your feedback on the UCSC Genome Browser
>> through May 31, 2007: http://www.surveymonkey.com/s.asp?U=881163743177
>>
>>
>> Carsten Lederer wrote:
>>> Dear UCSC Genome Team.
>>>
>>> We find the BLAT table function tremendously powerful in retrieving
>>> information on genes adjacent to our viral vector integrations in the
>>> human genome. On the downside, when using the position of our
>>> integration (i.e. a 60 kb range centred around that position), we cannot
>>> submit additional identifiers to then correlate the output with the
>>> information we already have for our integration sites.
>>> For position information it is already possible to submit a fourth
>>> field, but that field is then not displayed in the output table.
>>> Ideally, an input format like "chr7:73739410-73799410 Site0001" would
>>> result in an output like "Site0001 uc003uat.1 ..." that would allow us
>>> to use e.g. the Excel Vlookup function for annotation. At present we use
>>> the positional information on the chromosome to match BLAT Table outputs
>>> with our own data, which is somewhat awkward (for your info, we sort
>>> positions in ascending order and use Vlookup with the range_lookup set
>>> to "True") and leads to errors if two integrations are close to one
>>> another.
>>>
>>> Is there a function or check box I have overlooked? Otherwise, would it
>>> be possible to offer the option of displaying the fourth (ID) field in
>>> the output table? Or what other solution to our problem (more elegant
>>> than our current work-around) would you suggest?
>>>
>>> I would be grateful for your feedback or input. Keep up the good work,
>>> Carsten Lederer
>>>
>>> Carsten Lederer, PhD
>>> --------------------
>>> Telethon Institute of Gene Therapy
>>> Fondazione San Raffaele del Monte Tabor
>>> Via Olgettina 58
>>> 20132 Milan
>>> Italy
>>> --------------------
>>> Phone 0039-02-26434707
>>> Fax 0039-02-26434668
>>> --------------------
>>> Carsten at inbox.com
>>> C.Lederer at hsr.it
>>>
>>> ____________________________________________________________
>>> GET FREE 5GB EMAIL - Check out spam free email with many cool features!
>>> Visit http://www.inbox.com/email to find out more!
>>>
>>> _______________________________________________
>>> Genome maillist - Genome at soe.ucsc.edu
>>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>>
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list