[Genome] Hpa II sites count

Hiram Clawson hiram at soe.ucsc.edu
Wed Feb 13 09:53:12 PST 2008


Good Morning Joseph:

Please see the listing below for your answer.

If you want to produce this information yourself, and perhaps
measure other small motifs, you can pick up the hg18.2bit file
which is our compressed format of the sequence:
	http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/hg18.2bit

And the "kent" source tree via CVS:
	http://genome.ucsc.edu/admin/cvs.html
or from the zip file: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
with the build instructions: http://genome.ucsc.edu/admin/jk-install.html

Then you can build the program in the source tree:
	kent/src/utils/findMotif/

And run that on the hg18.2bit file.  I had to modify that source
to allow a 4 character search sequence since its default is 5 to 16
characters.

--Hiram
	
joseph wrote:
> Hi
> Is there way to get an accurate count of the Hpa II sites (CCGG )
> in the human genome hg18?
> thanks
> Joseph

# count       chrom name
# of CCGG
      23 chrM
   15698 chrY
   30601 chr21
   50106 chr18
   57539 chr13
   57977 chr22
   63814 chr20
   70014 chr14
   72558 chr15
   90237 chrX
   98767 chr8
  100892 chr16
  103249 chr9
  103436 chr12
  103564 chr4
  108817 chr11
  110108 chr10
  110408 chr6
  111044 chr19
  112074 chr5
  112337 chr17
  121254 chr3
  126732 chr7
  166715 chr2
  194234 chr1

Example chrM locations:
chrM    103     107     1       1000    +
chrM    932     936     2       1000    +
chrM    3078    3082    3       1000    +
chrM    3246    3250    4       1000    +
chrM    4711    4715    5       1000    +
chrM    4846    4850    6       1000    +
chrM    5242    5246    7       1000    +
chrM    5742    5746    8       1000    +
chrM    5766    5770    9       1000    +
chrM    6262    6266    10      1000    +
chrM    6571    6575    11      1000    +
chrM    6688    6692    12      1000    +
chrM    6850    6854    13      1000    +
chrM    7204    7208    14      1000    +
chrM    8112    8116    15      1000    +
chrM    8150    8154    16      1000    +
chrM    9292    9296    17      1000    +
chrM    11688   11692   18      1000    +
chrM    12123   12127   19      1000    +
chrM    13364   13368   20      1000    +
chrM    13712   13716   21      1000    +
chrM    15925   15929   22      1000    +
chrM    16454   16458   23      1000    +








More information about the Genome mailing list