[Genome] Hpa II sites count
Hiram Clawson
hiram at soe.ucsc.edu
Wed Feb 13 09:53:12 PST 2008
Good Morning Joseph:
Please see the listing below for your answer.
If you want to produce this information yourself, and perhaps
measure other small motifs, you can pick up the hg18.2bit file
which is our compressed format of the sequence:
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/hg18.2bit
And the "kent" source tree via CVS:
http://genome.ucsc.edu/admin/cvs.html
or from the zip file: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
with the build instructions: http://genome.ucsc.edu/admin/jk-install.html
Then you can build the program in the source tree:
kent/src/utils/findMotif/
And run that on the hg18.2bit file. I had to modify that source
to allow a 4 character search sequence since its default is 5 to 16
characters.
--Hiram
joseph wrote:
> Hi
> Is there way to get an accurate count of the Hpa II sites (CCGG )
> in the human genome hg18?
> thanks
> Joseph
# count chrom name
# of CCGG
23 chrM
15698 chrY
30601 chr21
50106 chr18
57539 chr13
57977 chr22
63814 chr20
70014 chr14
72558 chr15
90237 chrX
98767 chr8
100892 chr16
103249 chr9
103436 chr12
103564 chr4
108817 chr11
110108 chr10
110408 chr6
111044 chr19
112074 chr5
112337 chr17
121254 chr3
126732 chr7
166715 chr2
194234 chr1
Example chrM locations:
chrM 103 107 1 1000 +
chrM 932 936 2 1000 +
chrM 3078 3082 3 1000 +
chrM 3246 3250 4 1000 +
chrM 4711 4715 5 1000 +
chrM 4846 4850 6 1000 +
chrM 5242 5246 7 1000 +
chrM 5742 5746 8 1000 +
chrM 5766 5770 9 1000 +
chrM 6262 6266 10 1000 +
chrM 6571 6575 11 1000 +
chrM 6688 6692 12 1000 +
chrM 6850 6854 13 1000 +
chrM 7204 7208 14 1000 +
chrM 8112 8116 15 1000 +
chrM 8150 8154 16 1000 +
chrM 9292 9296 17 1000 +
chrM 11688 11692 18 1000 +
chrM 12123 12127 19 1000 +
chrM 13364 13368 20 1000 +
chrM 13712 13716 21 1000 +
chrM 15925 15929 22 1000 +
chrM 16454 16458 23 1000 +
More information about the Genome
mailing list