[Genome] GC-rich human rpts

Brooke Rhead rhead at soe.ucsc.edu
Thu Apr 5 16:09:14 PDT 2007


Hello Minou,

We do not have a Genome Browser track that specifically contains GC-rich 
repeat regions.

You could get GC-rich repeat region sequences by (1) creating a BED file 
of GC-rich regions, (2) uploading it as a custom track, and (3) using 
the Table Browser to intersect your custom track with the RepeatMasker 
track and outputting sequence for the regions in the intersection.

You might be able to use the GC Percent track instead of creating your 
own custom track.  However, it probably will not work well for your 
purposes, as that track contains GC percent calculated in 5-base 
windows, and an intersection of that track with the RepeatMasker track 
will result in a lot of 5-bp regions.  Instead, you will likely want to 
determine your own GC-rich regions in larger bp windows.  There is a 
tool in the Kent source tree that will calculate GC percent in larger 
windows called 'hgGcPercent'.  The program calculates GC percentage in 
20kb windows by default, but the window size can be changed, so you can 
experiment with different sizes to find something that suits your needs.

If you have not already downloaded our source code, info on doing so is 
located here:
http://genome.ucsc.edu/FAQ/FAQlicense.html#license3

Here is the usage statement for the hgGcPercent program:
=====================
hgGcPercent - Calculate GC Percentage in 20kb windows
usage:
    hgGcPercent [options] database nibDir
      nibDir can be a .2bit file, a directory that contains a
      database.2bit file, or a directory that contains *.nib files.
      Loads gcPercent table with counts from sequence.
options:
    -win=<size> - change windows size (default 20000)
    -noLoad - do not load mysql table - create bed file
    -file=<filename> - output to <filename> (stdout OK) (implies -noLoad)
    -chr=<chrN> - process only chrN from the nibDir
    -noRandom - ignore randome chromosomes from the nibDir
    -noDots - do not display ... progress during processing
    -doGaps - process gaps correctly (default: gaps are not counted as GC)
    -wigOut - output wiggle ascii data ready to pipe to wigEncode
    -overlap=N - overlap windows by N bases (default 0)
    -verbose=N - display details to stderr during processing
    -bedRegionIn=input.bed   Read in a bed file for GC content in 
specific regions and write to bedRegionsOut
    -bedRegionOut=output.bed Write a bed file of GC content in specific 
regions from bedRegionIn

example:
   calculate GC percent in 5 base windows using a 2bit nib assembly (dp2):
   hgGcPercent -wigOut -doGaps -file=stdout -win=5 dp2 \
       /cluster/data/dp2 | wigEncode stdin gc5Base.wig gc5Base.wib
=====================

The -noLoad option will create a BED file that you filter for GC-rich 
regions and then use your results to create a custom track.

Once you have the BED file, you can upload it as a custom track 
(instructions here: 
http://genome.ucsc.edu/goldenPath/help/customTrack.html#ADD_CT)
It can then be selected in the Table Browser and an intersection created 
with the RepeatMasker (rmsk) table.  To get the sequence of the 
intersected regions, simply choose "output format: sequence".

I hope this information is useful to you.  Let us know if you have 
further questions.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


bina at purdue.edu wrote:
> 
> How can I obtain a copy of the human DNA sequences that are collectively 
> referred to as GC-rich and purinr-rich repeats?
> 
> Minou Bina
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list