[Genome] Determining nucleotide composition in genomic sections

Archana Thakkapallayil archanat at soe.ucsc.edu
Mon Jun 25 10:25:51 PDT 2007


Hello again Celina,

Here is a suggestion from one of our developers. If you are using BED 
file, then you would need to use the BED file coordinates to get the DNA 
sequence first for each region
e.g. using faFrag in src/hg/utils/faFrag on a FASTA file of genomic 
sequence:

faFrag - Extract a piece of DNA from a .fa file.
usage:
   faFrag in.fa start end out.fa
options:
   -mixed - preserve mixed-case in FASTA file

or you can use twoBitToFa in src/utils/twoBitToFa on a 2bit file to
extract a FASTA file for a region.
woBitToFa - Convert all or part of .2bit file to fasta
usage:
   twoBitToFa input.2bit output.fa
options:
   -seq=name - restrict this to just one sequence
   -start=X  - start at given position in sequence (zero-based)
   -end=X - end at given position in sequence (non-inclusive)
   -seqList=file - file containing list of the desired sequence names
                    in the format seqSpec[:start-end], e.g. chr1 or
chr1:0-189
                    where coordinates are half-open zero-based, i.e.
[start,end)   -noMask - convert sequence to all upper case

Sequence and range may also be specified as part of the input file name 
using the syntax:
      /path/input.2bit:name
   or
      /path/input.2bit:name
   or
      /path/input.2bit:name:start-end

Then the resulting FASTA file can be used as input to faCount in
src/utils/faCount to get the base counts.

I hope this information helps you. Please let us know if you have 
further questions.

Regards,

Archana
UCSC Genome Bioinformatics Group

Archana Thakkapallayil wrote:
> Hello Celina,
>
> There is a program available in the Genome Browser source code called 
> 'faCount' that you can use to count base statistics and CpGs in FA files. 
>
> The source code is freely available for academic, non-profit and 
> personal use from here:
>
> http://genome.ucsc.edu/FAQ/FAQlicense#license3
>
> faCount is in the directory: src/utils/faCount/.
>
> You can obtain the source tree either via CVS:
>     http://genome.ucsc.edu/admin/cvs.html
> or a zip file:
>     http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
>
> Please note the build instructions:
>     http://genome.ucsc.edu/admin/jk-install.html
>
> All of the kent utilities output their usage message and command line 
> options by running them with no arguments.
>
> I hope this is helpful to you.  Please don't hesitate to contact us 
> again if you require further assistance.
>
> Regards,
>
> Archana
> UCSC Genome Bioinformatics Group
>
> Montemayor, Celina wrote:
>   
>> Hi,
>>  
>> I was wondering if there's a way of determining the amount (or percentage) of As, Ts, Cs, and Gs within particular genomic sections (i,e, within one of my BED files)? or at least determine the total amount of As, Ts, Cs, and Gs in a specific genome?
>>  
>> Thank you very much,
>>  
>> Celina Montemayor
>> Graduate Student, Pereira lab
>> Baylor College of Medicine
>> Houston, Texas
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>   
>>     
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   



More information about the Genome mailing list