[Genome] Determining nucleotide composition in genomic sections
Archana Thakkapallayil
archanat at soe.ucsc.edu
Mon Jun 25 10:25:51 PDT 2007
Hello again Celina,
Here is a suggestion from one of our developers. If you are using BED
file, then you would need to use the BED file coordinates to get the DNA
sequence first for each region
e.g. using faFrag in src/hg/utils/faFrag on a FASTA file of genomic
sequence:
faFrag - Extract a piece of DNA from a .fa file.
usage:
faFrag in.fa start end out.fa
options:
-mixed - preserve mixed-case in FASTA file
or you can use twoBitToFa in src/utils/twoBitToFa on a 2bit file to
extract a FASTA file for a region.
woBitToFa - Convert all or part of .2bit file to fasta
usage:
twoBitToFa input.2bit output.fa
options:
-seq=name - restrict this to just one sequence
-start=X - start at given position in sequence (zero-based)
-end=X - end at given position in sequence (non-inclusive)
-seqList=file - file containing list of the desired sequence names
in the format seqSpec[:start-end], e.g. chr1 or
chr1:0-189
where coordinates are half-open zero-based, i.e.
[start,end) -noMask - convert sequence to all upper case
Sequence and range may also be specified as part of the input file name
using the syntax:
/path/input.2bit:name
or
/path/input.2bit:name
or
/path/input.2bit:name:start-end
Then the resulting FASTA file can be used as input to faCount in
src/utils/faCount to get the base counts.
I hope this information helps you. Please let us know if you have
further questions.
Regards,
Archana
UCSC Genome Bioinformatics Group
Archana Thakkapallayil wrote:
> Hello Celina,
>
> There is a program available in the Genome Browser source code called
> 'faCount' that you can use to count base statistics and CpGs in FA files.
>
> The source code is freely available for academic, non-profit and
> personal use from here:
>
> http://genome.ucsc.edu/FAQ/FAQlicense#license3
>
> faCount is in the directory: src/utils/faCount/.
>
> You can obtain the source tree either via CVS:
> http://genome.ucsc.edu/admin/cvs.html
> or a zip file:
> http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
>
> Please note the build instructions:
> http://genome.ucsc.edu/admin/jk-install.html
>
> All of the kent utilities output their usage message and command line
> options by running them with no arguments.
>
> I hope this is helpful to you. Please don't hesitate to contact us
> again if you require further assistance.
>
> Regards,
>
> Archana
> UCSC Genome Bioinformatics Group
>
> Montemayor, Celina wrote:
>
>> Hi,
>>
>> I was wondering if there's a way of determining the amount (or percentage) of As, Ts, Cs, and Gs within particular genomic sections (i,e, within one of my BED files)? or at least determine the total amount of As, Ts, Cs, and Gs in a specific genome?
>>
>> Thank you very much,
>>
>> Celina Montemayor
>> Graduate Student, Pereira lab
>> Baylor College of Medicine
>> Houston, Texas
>> _______________________________________________
>> Genome maillist - Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>
>>
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list