[Genome] High Volume Traffic

Hiram Clawson hiram at soe.ucsc.edu
Fri Jan 25 10:27:56 PST 2008


Good Morning Jairav:

There are several alternatives to fetching DNA from the genome browser
without causing an excess load.

If you load your sequence positions as a custom track in the genome
browser, you can then ask the "Tables" browser to fetch those
locations in fasta sequence.  Ask for a gzip file return and
you will not have to parse any HTML at all to get your result.
You can fetch quite a bit of sequence in this manner with no
special load on the system.  The answer will be immediately
available.  See also:
	http://genome.ucsc.edu/goldenPath/help/customTrack.html
and	http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html

If you have the kent source tree and its utilities, you can
freely extract DNA sequences from the hg17.2bit file with
the twoBitToFa command.  It takes a -seqList=file argument
which can specify a set of locations to extract sequence for.
See also: http://genome.ucsc.edu/admin/cvs.html
or: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
and: http://genome.ucsc.edu/admin/jk-install.html
	
--Hiram

jairav at northwestern.edu wrote:
> I am a grad student, .  I'm trying to access many (at the order of 1000)  DNA sequences (<2.5kb)
> from HG17, with a perl script that accesses the genome browser website and parses html.
> 
> The server imposes a delay because of the volume of traffic : 
> 
> -Is there some way to get around this?
> -If not, is it ok that I keep doing this? I plan to run this on data from a number of cell lines
> and I can keep running it overnight, that is no problem.
> -Can I access the dna sequences in some other way, to avoid the high volume on the server.
> 
> 
> Thank you,
> 
> Jairav Desai
> Graduate Student
> Children's Memorial Research Center
> Northwestern University 


More information about the Genome mailing list