[Genome] CpG Island data source
Rachel Harte
hartera at soe.ucsc.edu
Tue Nov 28 11:50:29 PST 2006
Hello Sally,
To find the description of data for a particular track, either click on
the blue/gray mini-button at the side of the track in the Browser image or
scroll down to the track control for CpG Islands and click on the link
there. The CpG Islands track control is in the "Expression and Regulation"
group below the Browser image.
Here is part of the descripition for the CpG island data:
CpG islands were predicted by searching the sequence one base at a time,
scoring each dinucleotide (+17 for CG and -1 for others) and identifying
maximally scoring segments. Each segment was then evaluated for the
following criteria:
* GC content of 50% or greater
* length greater than 200 bp
* ratio greater than 0.6 of observed number of CG dinucleotides to the
expected number on the basis of the number of Gs and Cs in the segment
The CpG count is the number of CG dinucleotides in the island. The
Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count)
to the length. The ratio of observed to expected CpG is calculated
according to the formula cited in Gardiner-Garden et al. (1987) in the
References section below:
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
where N = length of sequence.
Reference:
Gardiner-Garden, M., Frommer, M. CpG islands in vertebrate genomes. J.
Mol. Biol. 196(2), 261-282 (1987).
I hope that this helps you. Please let us know if you have further
questions.
Rachel
Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
On Tue, 28 Nov 2006, Sally Gaddis wrote:
> Hi,
>
> I couldn't find a description of the source of the CpG island data on
> your site-- I assume the CpG island locations are predicted sites.
> Can you give me a reference for the prediction algorithm or name of
> the software used to predict the sites?
>
> Thanks,
> Sally
> --
> Sally Gaddis
> Research Programmer
> The University of Texas MD Anderson Cancer Center
> P.O. Box 389 / 1808 Park Road 1C
> Smithville, Texas 78957
> 512-237-9527 (voice) / 512-237-2475 (fax)
> sgaddis at sprd1.mdacc.tmc.edu
> http://spi.mdanderson.org/
> http://sciencepark.mdanderson.org/
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list