[Genome] High Volume Traffic
Hiram Clawson
hiram at soe.ucsc.edu
Fri Jan 25 10:27:56 PST 2008
Good Morning Jairav:
There are several alternatives to fetching DNA from the genome browser
without causing an excess load.
If you load your sequence positions as a custom track in the genome
browser, you can then ask the "Tables" browser to fetch those
locations in fasta sequence. Ask for a gzip file return and
you will not have to parse any HTML at all to get your result.
You can fetch quite a bit of sequence in this manner with no
special load on the system. The answer will be immediately
available. See also:
http://genome.ucsc.edu/goldenPath/help/customTrack.html
and http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html
If you have the kent source tree and its utilities, you can
freely extract DNA sequences from the hg17.2bit file with
the twoBitToFa command. It takes a -seqList=file argument
which can specify a set of locations to extract sequence for.
See also: http://genome.ucsc.edu/admin/cvs.html
or: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
and: http://genome.ucsc.edu/admin/jk-install.html
--Hiram
jairav at northwestern.edu wrote:
> I am a grad student, . I'm trying to access many (at the order of 1000) DNA sequences (<2.5kb)
> from HG17, with a perl script that accesses the genome browser website and parses html.
>
> The server imposes a delay because of the volume of traffic :
>
> -Is there some way to get around this?
> -If not, is it ok that I keep doing this? I plan to run this on data from a number of cell lines
> and I can keep running it overnight, that is no problem.
> -Can I access the dna sequences in some other way, to avoid the high volume on the server.
>
>
> Thank you,
>
> Jairav Desai
> Graduate Student
> Children's Memorial Research Center
> Northwestern University
More information about the Genome
mailing list