[Genome] GC-rich human rpts
Brooke Rhead
rhead at soe.ucsc.edu
Thu Apr 5 16:09:14 PDT 2007
Hello Minou,
We do not have a Genome Browser track that specifically contains GC-rich
repeat regions.
You could get GC-rich repeat region sequences by (1) creating a BED file
of GC-rich regions, (2) uploading it as a custom track, and (3) using
the Table Browser to intersect your custom track with the RepeatMasker
track and outputting sequence for the regions in the intersection.
You might be able to use the GC Percent track instead of creating your
own custom track. However, it probably will not work well for your
purposes, as that track contains GC percent calculated in 5-base
windows, and an intersection of that track with the RepeatMasker track
will result in a lot of 5-bp regions. Instead, you will likely want to
determine your own GC-rich regions in larger bp windows. There is a
tool in the Kent source tree that will calculate GC percent in larger
windows called 'hgGcPercent'. The program calculates GC percentage in
20kb windows by default, but the window size can be changed, so you can
experiment with different sizes to find something that suits your needs.
If you have not already downloaded our source code, info on doing so is
located here:
http://genome.ucsc.edu/FAQ/FAQlicense.html#license3
Here is the usage statement for the hgGcPercent program:
=====================
hgGcPercent - Calculate GC Percentage in 20kb windows
usage:
hgGcPercent [options] database nibDir
nibDir can be a .2bit file, a directory that contains a
database.2bit file, or a directory that contains *.nib files.
Loads gcPercent table with counts from sequence.
options:
-win=<size> - change windows size (default 20000)
-noLoad - do not load mysql table - create bed file
-file=<filename> - output to <filename> (stdout OK) (implies -noLoad)
-chr=<chrN> - process only chrN from the nibDir
-noRandom - ignore randome chromosomes from the nibDir
-noDots - do not display ... progress during processing
-doGaps - process gaps correctly (default: gaps are not counted as GC)
-wigOut - output wiggle ascii data ready to pipe to wigEncode
-overlap=N - overlap windows by N bases (default 0)
-verbose=N - display details to stderr during processing
-bedRegionIn=input.bed Read in a bed file for GC content in
specific regions and write to bedRegionsOut
-bedRegionOut=output.bed Write a bed file of GC content in specific
regions from bedRegionIn
example:
calculate GC percent in 5 base windows using a 2bit nib assembly (dp2):
hgGcPercent -wigOut -doGaps -file=stdout -win=5 dp2 \
/cluster/data/dp2 | wigEncode stdin gc5Base.wig gc5Base.wib
=====================
The -noLoad option will create a BED file that you filter for GC-rich
regions and then use your results to create a custom track.
Once you have the BED file, you can upload it as a custom track
(instructions here:
http://genome.ucsc.edu/goldenPath/help/customTrack.html#ADD_CT)
It can then be selected in the Table Browser and an intersection created
with the RepeatMasker (rmsk) table. To get the sequence of the
intersected regions, simply choose "output format: sequence".
I hope this information is useful to you. Let us know if you have
further questions.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
bina at purdue.edu wrote:
>
> How can I obtain a copy of the human DNA sequences that are collectively
> referred to as GC-rich and purinr-rich repeats?
>
> Minou Bina
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list