[Genome] Subtracting BED lists

Brooke Rhead rhead at soe.ucsc.edu
Thu Jul 5 14:17:19 PDT 2007


Hi Jeffrey,

There is another tool in the Kent source called 'featureBits' that can 
do this.  There are many options for featureBits -- here is the usage 
statement:

-----
featureBits - Correlate tables via bitmap projections.
usage:
   featureBits database table(s)
This will return the number of bits in all the tables anded together
Pipe warning:  output goes to stderr.
Options:
   -bed=output.bed   Put intersection into bed format. Can use stdout.
   -fa=output.fa     Put sequence in intersection into .fa file
   -faMerge          For fa output merge overlapping features.
   -minSize=N        Minimum size to output (default 1)
   -chrom=chrN       Restrict to one chromosome
   -chromSize=sizefile      read chrom sizes from file instead of database.
   -or               Or tables together instead of anding them
   -not              Output negation of resulting bit set.
   -countGaps        Count gaps in denominator
   -noRandom         Don't include _random (or Un) chromosomes
   -noHap            Don't include _hap chromosomes
   -dots=N           Output dot every N chroms (scaffolds) processed
   -minFeatureSize=n Don't include bits of the track that are smaller than
                     minFeatureSize, useful for differentiating between
                     alignment gaps and introns.
   -bin=output.bin   Put bin counts in output file
   -binSize=N        Bin size for generating counts in bin file (default 
500000)
   -binOverlap=N     Bin overlap for generating counts in bin file 
(default 250000)
   -bedRegionIn=input.bed   Read in a bed file for bin counts in 
specific regions and write to bedRegionsOut
   -bedRegionOut=output.bed Write a bed file of bin counts in specific 
regions from bedRegionIn
   -enrichment       Calculates coverage and enrichment assuming first table
                     is reference gene track and second track something else
   '-where=some sql pattern'  restrict to features matching some sql pattern
You can include a '!' before a table name to negate it.
Some table names can be followed by modifiers such as:
    :exon:N          Break into exons and add N to each end of each exon
    :cds             Break into coding exons
    :intron:N        Break into introns, remove N from each end
    :utr5, :utr3     Break into 5' or 3' UTRs
    :upstream:N      Consider the region of N bases before region
    :end:N           Consider the region of N bases after region
    :score:N         Consider records with score >= N
    :upstreamAll:N   Like upstream, but doesn't filter out genes that
                     have txStart==cdsStart or txEnd==cdsEnd
    :endAll:N        Like end, but doesn't filter out genes that
                     have txStart==cdsStart or txEnd==cdsEnd
The tables can be bed, psl, or chain files, or a directory full of
such files as well as actual database tables.  To count the bits
used in dir/chrN_something*.bed you'd do:
   featureBits database dir/_something.bed
-----

Note that the -not option will negate the result of an intersection.

Alternatively, you can use the online Table Browser tool to get the list 
of locations that belong only to the first list.  To do this, first 
upload your two BEDs as custom tracks.  Then go to the Table Browser and 
select one  of the BED tracks.  From here you can proceed in a couple of 
different ways.  One way to do it is to intersect the two lists, then 
intersect the first list with the complement of the intersection from 
the first step.   For instance, with the first BED selected, hit the 
"intersection: create" button and choose the second BED track.  Save 
this intersection as a third custom track.  Then go back to the Table 
Browser and select the first BED again, and hit "intersection: create" 
again.  This time, choose your new custom track.  Also check the box to 
"Complement [your custom track] before intersection/union".  This 
intersection should contain the regions from the first list that are not 
in the intersection of the two lists.

I hope this information helps.  Please let us know if we can clarify any 
of the above.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



Jeffrey Rosenfeld wrote:
> Do you have a downloadable program that will subtract two lists of BED 
> points.  I have found bedIntersect in the Kent source files and it has a 
> similar function, but I could not find a program that does the reverse. 
>   For two given BED files, I want a list of genomic locations that are 
> only in one of them.
>
> Thank You,
>
> Jeffrey Rosenfeld
> Cold Spring Harbor Lab
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   


More information about the Genome mailing list