[Genome] FW: custom track 100 MB size limit? (resend after partial bounce)
Hiram Clawson
hiram at soe.ucsc.edu
Fri Apr 20 15:57:57 PDT 2007
Good Afternoon Gordon:
The practical limits to custom track data size are limited by a variety
of factors. Data bandwidth between your submitting site and the
genome browser servers is most likely the critical limiting factor.
The only actual programmed limit in our software at this end is
a limit of 300,000,000 data points in a single wiggle track submission.
And in experiments here today I can not even reach that limit due
to bandwidth constraints and processing time on our local area network between our
servers. The Apache CGI binary time-limit of five minutes prevents
such large files from completing their transfer.
What you can do is try packaging your data sets into per-chrom
tracks. And then, to submit multiple per-chrom files in a single
custom track submission, create a small file with URL references
to each of your data files. For example, this two-line file will submit
two data sets in one submission, multiWiggles.txt:
http://genome-test.cse.ucsc.edu/~hiram/ctDb/chr21.phastCons17.ct.gz
http://genome-test.cse.ucsc.edu/~hiram/ctDb/chr22.phastCons17.ct.gz
What you submit to make this happen is the URL to this file, for example:
http://genome-test.cse.ucsc.edu/~hiram/ctDb/multiWiggles.txt
Each line of that file is processed by the custom track business and each data
set is loaded into a separate track. When I try this here locally, these
two "small" data sets, 32 Mb compressed, 382 Mb uncompressed, 67 million lines,
take about 3 minutes to transfer and load into the browser. This is essentially
our practical limit here in house. At a remote site with less bandwith, the limits
would be even smaller.
You may want to consider a limited installation of a genome browser at your
location to load full-genome data sets. See also:
http://genomewiki.ucsc.edu/index.php/Minimal_Browser_Installation
--Hiram
Gordon Robertson wrote:
> Hello
>
> (I apologize for this resend, but I received a note from your 'bounce' system indicating that it had refused the final paragraph in my original post.)
>
> We have used BED and WIG custom tracks for some time. As we ramp up use of a Solexa sequencer for chromatin immunoprecipitation work in mammals, however, we find that a subset of histone modifications appears to require particularly deep sequencing in order that we find essentially all the peaks in the human genome. From such experiments, WIG files that represent a significance-thresholded genome-wide set of human ChIP peaks can be larger than 100MB, even when we use the most compact WIG format. When we try to load such files, we find that the genome browser has 100MB custom track file size limit. While we could work with more than one (smaller) WIG file for each such experiment, single files are preferable for practical reasons. We can approximate the information in WIG files with far more compact BED files, using colour and shape to suggest peak geometry; however, we'd prefer to browse WIG files rather than BED.
>
> In this area, pre-publication, manuscript submission/review and post-publication states of a research project will have somewhat different needs. During the first two phases, the WIG files will not be publicly available, and the research group (or reviewer) will want to be able to work flexibly with the files. In a post-publication phase, perhaps the WIG files would be hosted by UCSC and loading size would not be an issue.
>
> Could I ask you to comment on this? Would you be willing to raise the file load size limits above 100 MB, either for all users, or for particular users? For us, the alternative appears to be to install the browser locally, or to work with multiple WIG files for certain ChIP experiments.
>
> Thank you,
>
> G
> ---
> Gordon Robertson
> Gene Regulation Informatics
> Canada's Michael Smith Genome Sciences Centre
> Vancouver BC Canada
> www.bcgsc.ca
> grobertson at bcgsc.ca
More information about the Genome
mailing list