[Genome] is there a fast BED2seq tool?

Rachel Harte hartera at soe.ucsc.edu
Tue Nov 21 12:45:43 PST 2006


Stein,
I could either make a 2bit file and put it there for download or you could
download the nib files from /gbdb/dm2/nib
and then use the nibFrag program from src/utils/nibFrag
nibFrag [options] file.nib start end strand out.fa

or you can download the FASTA files from:
http://hgdownload.cse.ucsc.edu/goldenPath/dm2/bigZips/chromFa.zip

and use the faFrag program in:
src/utils/faFrag/
usage:
   faFrag in.fa start end out.fa
options:
   -mixed - preserve mixed-case in FASTA file

You could use awk to create a script file that could be run:

awk '{printf "faFrag %s.fa %d %d \n", $1, $2, $3}' file.bed > faFrag.sh

Then you can run this file as a script.
I hope that this helps.

Rachel

Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


On Tue, 21 Nov 2006, Stein Aerts wrote:

> Hi Rachel,
> Thank you for the quick reply - this seems indeed the solution for me.
>
> I could find a hg18.2bit file in hg18/nib but I particularly need the fly
> species, and I couldn't find any 2bit file in
> ftp://hgdownload.cse.ucsc.edu/gbdb/dm2/nib. Could it be somewhere else
> perhaps?
>
> Many thanks again for all the help,
> Stein Aerts
>
>
>
>
> Rachel Harte wrote:
>
>  Hello Stein,
>
> One way that this can be done is by using an awk script to get the
> coordinates from the BED file and then piping this to the twoBitToFa
> program:
>
> awk '{printf "%s:%d-%d\n", $1, $2, $3}' file.bed \
>         | twoBitToFa hg18.2bit output.fa -seqList=stdin
>
> The twoBitToFa program is in the Browser source code which can be
> downloaded from here:
>
> http://genome.ucsc.edu/FAQ/FAQlicense#license3
>
> The twoBitToFa program is in the directory:
> src/utils/twoBitToFa/
>
> The 2bit files can be downloaded from:
> ftp://hgdownload.cse.ucsc.edu/gbdb/<db>/nib/<db>.2bit
> where <db> should be replaced with the name of the assembly e.g. hg18, mm8
> etc.
>
> I hope that this helps you. Please let us know if you have further
> questions.
>
> Rachel
>
> Rachel Harte
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
>
>
>  Subject: is there a fast BED2seq tool?
> From: Stein Aerts <stein.aerts at med.kuleuven.be>
> Date: Tue, 21 Nov 2006 11:33:54 +0100
> To: genome at soe.ucsc.edu
>
> Hi,
>
> I was wondering whether there is a fast progam that retrieves fasta
> subsequences from local chromosome sequence files using BED as input.
> I'm currently using  $chromseq->subseq($location) in bioperl, but this
> becomes too slow for large BED files, much slower in fact than exporting
> the sequences in the Table Browser, after having added the BED as user
> track. Perhaps the C progam behind this functionality in the Table
> Browser is available to run locally (i.e., when the chromosome sequences
> are also local)?
>
> Thanks in advance,
> Best regards,
> Stein Aerts
>
> --
> Stein Aerts, PhD
> Laboratory of Neurogenetics
> Department of Molecular and Developmental Genetics
> VIB - University of Leuven
> Herestraat 49, bus 602
> 3000 Leuven, Belgium
> Tel. +32(16)330132
> Fax. +32(16)345992
> http://www.med.kuleuven.ac.be/cme-mg/lng/
>
>
>
>
>
>
>  _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
>
>
>
>
>  --
> Stein Aerts, PhD
> Laboratory of Neurogenetics
> Department of Molecular and Developmental Genetics
> VIB - University of Leuven
> Herestraat 49, bus 602
> 3000 Leuven, Belgium
> Tel. +32(16)330132
> Fax. +32(16)345992
> http://www.med.kuleuven.ac.be/cme-mg/lng/
>
>


More information about the Genome mailing list