[Genome] how many chr location I may have in custom wig file?
Hiram Clawson
hiram at soe.ucsc.edu
Thu Jun 7 20:21:51 PDT 2007
Good Afternoon Irina:
The difficulty is due to the type of input format. The four-column
input format is the most inefficient method of input for the
wiggle tracks. If you convert your data input into either fixedStep
or variableStep format, you will be able to load much more data.
As mentioned on this help page:
http://genome.ucsc.edu/goldenPath/help/wiggle.html
> In many cases this format will be too verbose to conveniently
> specify a large number of data values. Two other two methods
> of specifying data were developed to be more space efficient.
We'll try to improve this warning to be much more explicit about the
inefficiency of this four-column data format.
What's happening is that each of your lines is being converted
into (chromEnd-chromStart) number of data points. There is a limit
of 300,000,000 data points. Each of your input lines specifies
10,000 data points, thus your line count
input limit would be approximately 30,000 = (300,000,000 / 10,000)
Since your data does appear to be regular, it can be fixed step:
fixedStep chrom=chr8 start=10460001 step=10000 span=10000
3
2
1
0
2
-1
... etc ...
This type of data input only consumes one byte of storage
for each data value, and you can input 300,000,000 of these types
of data points, thereby covering 10,000 (span) * 300,000,000
genome bases. (== 3,000,000,000,000 bases == 10X human genome size)
Note the 1-relative start coordinate for fixedStep and variableStep,
unlike bed chromStart coordinates which are 0-relative.
--Hiram
Khrebtukova, Irina wrote:
> Hi,
>
> am I right that if I'm making custom track type=wiggle_0 in bed format
> with 2 colors (for positive and negative values), like say:
>
> track type=wiggle_0 name="Whatever" visibility=full autoScale=off
> color=200,100,0 altColor=0,100,200
> chr8 104060000 104070000 3
> chr8 104070000 104080000 2
> chr8 104080000 104090000 1
> chr8 104090000 104100000 0
> chr8 104100000 104110000 2
> chr8 104110000 104120000 -1
> chr8 104120000 104130000 3
> chr8 104130000 104140000 3
> chr8 104140000 104150000 3
> chr8 104150000 104160000 2
> chr8 104160000 104170000 2
>
> am I right that in this case I can NOT put more than 25K chromosomal
> locations into one track? (it seems exactly like this...)
>
> though it's a bit strange because I was able so far to put MUCH more
> chromosomal locations in simple BED tracks (just chr start end) as well
> as simple WIG tracks (variableStep but fixed small span).
>
> is this your policy or I'm doing something wrong?
>
> thanks! I appreciate your help and needless to say (I think it's
> obvious) that your browser is the best! (I doubt anyone could argue...)
>
> Irina Khrebtukova, PhD
> Sr. Staff Bioinformatics Scientist
> Illumina Inc.
> 25861 Industrial Blvd.,
> Hayward, CA 94545
> ph: 510-723-9219
> ikhrebtukova at illumina.com
>
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list