[Genome] is there a fast BED2seq tool?
Rachel Harte
hartera at soe.ucsc.edu
Tue Nov 21 10:56:53 PST 2006
Hello Stein,
One way that this can be done is by using an awk script to get the
coordinates from the BED file and then piping this to the twoBitToFa
program:
awk '{printf "%s:%d-%d\n", $1, $2, $3}' file.bed \
| twoBitToFa hg18.2bit output.fa -seqList=stdin
The twoBitToFa program is in the Browser source code which can be
downloaded from here:
http://genome.ucsc.edu/FAQ/FAQlicense#license3
The twoBitToFa program is in the directory:
src/utils/twoBitToFa/
The 2bit files can be downloaded from:
ftp://hgdownload.cse.ucsc.edu/gbdb/<db>/nib/<db>.2bit
where <db> should be replaced with the name of the assembly e.g. hg18, mm8
etc.
I hope that this helps you. Please let us know if you have further
questions.
Rachel
Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> Subject: is there a fast BED2seq tool?
> From: Stein Aerts <stein.aerts at med.kuleuven.be>
> Date: Tue, 21 Nov 2006 11:33:54 +0100
> To: genome at soe.ucsc.edu
>
> Hi,
>
> I was wondering whether there is a fast progam that retrieves fasta
> subsequences from local chromosome sequence files using BED as input.
> I'm currently using $chromseq->subseq($location) in bioperl, but this
> becomes too slow for large BED files, much slower in fact than exporting
> the sequences in the Table Browser, after having added the BED as user
> track. Perhaps the C progam behind this functionality in the Table
> Browser is available to run locally (i.e., when the chromosome sequences
> are also local)?
>
> Thanks in advance,
> Best regards,
> Stein Aerts
>
> --
> Stein Aerts, PhD
> Laboratory of Neurogenetics
> Department of Molecular and Developmental Genetics
> VIB - University of Leuven
> Herestraat 49, bus 602
> 3000 Leuven, Belgium
> Tel. +32(16)330132
> Fax. +32(16)345992
> http://www.med.kuleuven.ac.be/cme-mg/lng/
>
>
>
>
More information about the Genome
mailing list