[Genome] WIG format question

Hiram Clawson hiram at soe.ucsc.edu
Mon Oct 15 10:02:04 PDT 2007


Good Morning Edward:

Thank you for the suggested extension to the wiggle format.
I would like to make the variable width format more efficient.

Currently, the only way to do variable width data is to
use the bed-like four column format:
chrom chromStart chromEnd value

but this does have the known limitation of being inefficient
compared to the variableStep and fixedStep input formats.
Internally this four column format actually becomes single
base span variableStep format with a specified data point
for each base in the chromEnd-chromStart range.  This can
add up to a lot of data points very rapidly.  We have a limit
of 300,000,000 input data points on wiggle custom tracks.
This four column format could therefore barely cover chrom 1
and not much more.

Also, please do not use the "span" argument to specify
variable width data.  That will not work as you think it
might.  A complete set of input data points should all
be the same "span".  You will see inconsistent results if
you mix up different span data together in the same
data set.

--Hiram


Oakeley, Edward wrote:
> Hi there,
> 
> I have numerical data that I wish to plot as a histogram rather than an
> intensity. The best format for this seems to be the WIG format but
> unfortunately my data contains blocks of variable width. You can do this
> with the SPAN parameter but this means that I need to make a separate
> track for each datapoint so my WIG files become huge as each value now
> has two lines of "header" information. Can I request that the format be
> extended so that instead of being 
> 
> Start signal
> 
> It becomes:
> 
> Start signal [span]
> 
> Where span is optional (that way it won't break existing files) 
> 
> Thanks 
> 
> Edward Oakeley


More information about the Genome mailing list