[Genome] Subtracting BED lists
Brooke Rhead
rhead at soe.ucsc.edu
Thu Jul 5 14:17:19 PDT 2007
Hi Jeffrey,
There is another tool in the Kent source called 'featureBits' that can
do this. There are many options for featureBits -- here is the usage
statement:
-----
featureBits - Correlate tables via bitmap projections.
usage:
featureBits database table(s)
This will return the number of bits in all the tables anded together
Pipe warning: output goes to stderr.
Options:
-bed=output.bed Put intersection into bed format. Can use stdout.
-fa=output.fa Put sequence in intersection into .fa file
-faMerge For fa output merge overlapping features.
-minSize=N Minimum size to output (default 1)
-chrom=chrN Restrict to one chromosome
-chromSize=sizefile read chrom sizes from file instead of database.
-or Or tables together instead of anding them
-not Output negation of resulting bit set.
-countGaps Count gaps in denominator
-noRandom Don't include _random (or Un) chromosomes
-noHap Don't include _hap chromosomes
-dots=N Output dot every N chroms (scaffolds) processed
-minFeatureSize=n Don't include bits of the track that are smaller than
minFeatureSize, useful for differentiating between
alignment gaps and introns.
-bin=output.bin Put bin counts in output file
-binSize=N Bin size for generating counts in bin file (default
500000)
-binOverlap=N Bin overlap for generating counts in bin file
(default 250000)
-bedRegionIn=input.bed Read in a bed file for bin counts in
specific regions and write to bedRegionsOut
-bedRegionOut=output.bed Write a bed file of bin counts in specific
regions from bedRegionIn
-enrichment Calculates coverage and enrichment assuming first table
is reference gene track and second track something else
'-where=some sql pattern' restrict to features matching some sql pattern
You can include a '!' before a table name to negate it.
Some table names can be followed by modifiers such as:
:exon:N Break into exons and add N to each end of each exon
:cds Break into coding exons
:intron:N Break into introns, remove N from each end
:utr5, :utr3 Break into 5' or 3' UTRs
:upstream:N Consider the region of N bases before region
:end:N Consider the region of N bases after region
:score:N Consider records with score >= N
:upstreamAll:N Like upstream, but doesn't filter out genes that
have txStart==cdsStart or txEnd==cdsEnd
:endAll:N Like end, but doesn't filter out genes that
have txStart==cdsStart or txEnd==cdsEnd
The tables can be bed, psl, or chain files, or a directory full of
such files as well as actual database tables. To count the bits
used in dir/chrN_something*.bed you'd do:
featureBits database dir/_something.bed
-----
Note that the -not option will negate the result of an intersection.
Alternatively, you can use the online Table Browser tool to get the list
of locations that belong only to the first list. To do this, first
upload your two BEDs as custom tracks. Then go to the Table Browser and
select one of the BED tracks. From here you can proceed in a couple of
different ways. One way to do it is to intersect the two lists, then
intersect the first list with the complement of the intersection from
the first step. For instance, with the first BED selected, hit the
"intersection: create" button and choose the second BED track. Save
this intersection as a third custom track. Then go back to the Table
Browser and select the first BED again, and hit "intersection: create"
again. This time, choose your new custom track. Also check the box to
"Complement [your custom track] before intersection/union". This
intersection should contain the regions from the first list that are not
in the intersection of the two lists.
I hope this information helps. Please let us know if we can clarify any
of the above.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Jeffrey Rosenfeld wrote:
> Do you have a downloadable program that will subtract two lists of BED
> points. I have found bedIntersect in the Kent source files and it has a
> similar function, but I could not find a program that does the reverse.
> For two given BED files, I want a list of genomic locations that are
> only in one of them.
>
> Thank You,
>
> Jeffrey Rosenfeld
> Cold Spring Harbor Lab
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list