[Genome] multiple identifiers using table browser intersection

Brooke Rhead rhead at soe.ucsc.edu
Mon Jul 2 14:34:25 PDT 2007


Hi Archie,

The problem is with column 9 of the BED file, itemRgb.  The BED parser 
had trouble with the comma-separated "r,g,b" format.  We have fixed this 
problem with the BED parser (in the source file hg/lib/bed.c).  If you 
update your source tree to the latest source and recompile your 
libraries and the overlapSelect program, your files in their current 
state should work with overlapSelect.

Alternatively, a quick work-around for this problem (with the old 
source) is to turn the r,g,b values in column 9 into zeros before 
processing.  This awk command will do the trick:

    awk 'BEGIN{OFS=FS="\t"} {$9=0; print $0}' file1.bed > file1.fixed.bed

Let us know if you have further problems or questions.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



-------- Original Message --------
Subject: RE: [Genome] multiple identifiers using table browser intersection
Date: Fri, 29 Jun 2007 16:18:37 -0700
From: Russell, Archie <archie_russell at merck.com>
To: Brooke Rhead <rhead at soe.ucsc.edu>


Thanks a lot Brooke

Here are snippets of the two files


-----Original Message-----
From: Brooke Rhead [mailto:rhead at soe.ucsc.edu]
Sent: Friday, June 29, 2007 12:32 PM
To: Russell, Archie
Cc: genome at soe.ucsc.edu
Subject: Re: [Genome] multiple identifiers using table browser
intersection

Hi Archie,

It sounds like there might be a format issue with the BED file you are
using.  For instance, does it by any chance have a 'bin' column (it
should not)?  Alternatively, is it space-separated rather than
tab-separated?  The overlapSelect program only works with tab-separated
BEDs.

If you'd like, you can send me your file, or a sample of it.  The
developer who wrote overlapSelect has offered to take a look at it.  (No

need to copy the list if you send an attachment -- the mail list program

strips attachments.)

--
Brooke Rhead
UCSC Genome Bioinformatics Group



Russell, Archie wrote:
>  
> Hey Brooke
>  
> Thanks a lot for the pointers.   It looks like overlapSelect with 
> mergeOutput (or maybe idOutput) is probably what I need.
>  
> but I am having a problem with overlapSelect:
>  
>  
> % overlapSelect -selectFmt=bed -inFmt=bed 
> /info/genome/Projects/649/dog/ucsc_browser/boundary.bed 
> /info/genome/Projects/649/dog/ucsc_browser/ncrna.bed out.bed
>  
> gives the error
>  
> invalid unsigned number: "2,148,141"   (2,148,141 are block sizes)
> When i specify -selectCoordCols=0,1,2 -inCoordCols=0,1,2 things seem
to
> work but then I don't think I'm getting exon-level overlaps.
>  
> Can you tell me what I should change?
>  
> Thanks,
> Archie
>  
>  
> *Brooke Rhead* rhead at soe.ucsc.edu 
>
<mailto:genome%40soe.ucsc.edu?Subject=Re:%20%5BGenome%5D%20multiple%20id
entifiers%20using%20table%20browser%20intersection&In-Reply-To=%3C468410
4C.3070803%40soe.ucsc.edu%3E>
> /Thu Jun 28 12:47:24 PDT 2007/
>  
> Hello Archie,
> 
> We have not added any new functionality to the Table Browser that will
> join two tables on an identifier field.  However, there is a Galaxy
tool
> that will do this (at http://main.g2.bx.psu.edu/).  On the Galaxy
page,
> under the heading "Filter, Sort, Join, Compare, Subtract", there is a
> tool to "Join two Queries side by side on a specified field".
> 
> There is also a user-developed script on genomewiki with a similar
> function, called bedOverlapName (which in turn calls the kent source
> tool overlapSelect).  It is located here:
> 
> http://genomewiki.cse.ucsc.edu/index.php/BedOverlapName
> 
> I hope this information is helpful.
> 
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
> 
> 
> Russell, Archie wrote:
>  >/ Hi,
> />/ 
> />/ I want to do an intersection of two bed tracks and get a file that
gives
> />/ me pairs of identifiers (e.g. accessions) that overlap.   Is it
possible
> />/ to do this in the table browser?   I know this question has come
up
> />/ before and I think the answer was that this wasn't possible at the
time,
> />/ but maybe things have changed since then.
> />/ 
> />/ Thanks,
> />/ Archie
> />/ 
> />/ Archie Russell
> />/ Rosetta Inpharmatics
> />/ 206-802-6312
> />/  /
> 
> // 
> 
> Archie Russell
> Rosetta Inpharmatics
> 206-802-6312
> 
>  
> 
>
------------------------------------------------------------------------
------
> Notice: This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
> New Jersey, USA 08889), and/or its affiliates (which may be known
> outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD
> and in Japan, as Banyu - direct contact information for affiliates is
> available at http://www.merck.com/contact/contacts.html) that may be
> confidential, proprietary copyrighted and/or legally privileged. It is
> intended solely for the use of the individual or entity named on this
> message. If you are not the intended recipient, and have received this
> message in error, please notify us immediately by reply e-mail and
then
> delete it from your system.
> 
>
------------------------------------------------------------------------
------
> 



------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may be known
outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD
and in Japan, as Banyu - direct contact information for affiliates is
available at http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted and/or legally privileged. It is
intended solely for the use of the individual or entity named on this
message. If you are not the intended recipient, and have received this
message in error, please notify us immediately by reply e-mail and then
delete it from your system.

------------------------------------------------------------------------------


More information about the Genome mailing list