[Genome] Obtaining repeatmasked UCSC sequence via Bio::Das?

Rachel Harte hartera at soe.ucsc.edu
Fri Mar 30 15:36:11 PDT 2007


Ian,

Unfortunately, we don't support obtaining repeatmasked sequence through
DAS. There is a way of getting sequence through the Browser if you click
on the DNA link on the top blue bar when you are viewing the Genome Browser
image. You can also go to the Table Browser (Tables link on the top blue bar)
to query our databases and then select sequence as output. In both cases,
you can request for the sequence to be repeat-masked.

If you would like to use Perl to get the sequence, you could look at the
URLs that are constructed when you fetch sequence with either of these two
methods. The hgsid should be removed - it is just a session ID that keeps
track of the tracks and settings that you are using. For example to get
DNA sequence through the DNA link, here is a URL that was generated:

http://genome.ucsc.edu/cgi-bin/hgc?g=htcGetDna2&table=&i=mixed&o=151073053
&l=151073053&r=151383976&getDnaPos=chrX%3A151%2C073%2C054-151%2C383%2C976
&db=hg18&hgSeq.cdsExon=1&hgSeq.padding5=0&hgSeq.padding3=0&hgSeq.casing=upper
&hgSeq.maskRepeats=on&boolshad.hgSeq.maskRepeats=1&hgSeq.repMasking=lower
&boolshad.hgSeq.revComp=1&submit=get+DNA

This is for position chrX:151073054-151383976

Some of these parameters are redundant so o and l are the same. getDnaPos
contains URL encoding for a colon and commas in the position. You may to
play a little with some of the other parameter settings in the URL to
fine-tune your query. There may be parameters that do not appear in the
URL. You can get a complete list of your parameter settings by examining the
contents of your "cart" via the URL
http://genome.ucsc.edu/cgi-bin/cartDump.

One example that I have of using a URL to fetch data from our website
using Perl is in the BlatBot.pl script which may be downloaded from our wiki
page:

http://genomewiki.ucsc.edu/index.php/Blat_Scripts

If you use a programmatic method to retrieve the data, please keep in mind
that program-driven use of the Genome Browser is limited to a maximum of
one hit every 15 seconds and no more than 5,000 hits per day.

I hope that this helps you. Please let us know if you have further
questions.

Rachel


Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


On Fri, 30 Mar 2007, Ian Donaldson wrote:

> Please can you tell me whether it is possible to obtain repeatmasked (hard or
> soft) sequence using the Bio::Das perl interface with UCSC?  For example I use
> the following to obtain raw sequence:
>
> use Bio::Das;
> my $das = Bio::Das->new(-source => 'http://genome.cse.ucsc.edu/cgi-bin/das/',
> -dsn=>'hg17');
>
> Many thanks,
>
> Ian
> --
> Faculty of Life Sciences
> Michael Smith Building (B.1078)
> Oxford Road
> Manchester
> M13 9PT
> TEL: 0161-275-5980
> FAX: 0161-275-5082
>
>
>
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list