[Genome] mRNA to genomic sequence

Donna Karolchik donnak at soe.ucsc.edu
Tue Nov 7 13:58:04 PST 2006


hi David,

Are you familiar with the Table Browser's batch upload feature? This may give
you what you want. To retrieve the genomic sequence for several RefSeq mRNAs:

1. List all the mRNA accession IDs in a file, one per line.
2. In the Table Browser, set up your clade, genome, assembly, group, etc.
options.
3. Set region to "genome".
4. Click the "upload list" button and upload your list.
5. Select the "sequence" output option, then configure your fasta output as
desired.

You could then write a script to parse out the information you want from this
output.

Alternatively, you should be able to retrieve your information by removing the
hgsid paramter and adding "position=chrN:start-end" to your URL, where position
corresponds to your mRNA position. You may have to tinker with some of the other
parameter settings in the URL to fine-tune your query. You can get a complete
list of your parameter settings by examining the contents of your "cart" via the
URL http://genome.ucsc.edu/cgi-bin/cartDump. If you use a programmatic method to
retrieve the data, keep in mind that program-driven use of the Genome Browser is
limited to a maximum of one hit every 15 seconds and no more than 5,000 hits per
day.

-Donna
-----------------------------------
Donna Karolchik
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


----- Original Message ----- 
From: "Lomelin, David" <david.lomelin at ucsf.edu>
To: <genome at soe.ucsc.edu>
Sent: Friday, November 03, 2006 8:57 PM
Subject: [Genome] mRNA to genomic sequence


> Hi, I'm a student at UCSF working in Neil Risch's lab.  I'm interested in
> retrieving the genomic sequence for a given mRNA refseq sequence so that I
> could take a look at the intronic regions.  I saw that your Table Browser
> allows me this option exactly as I need; however, I'm interested in obtaining
> this information programatically so that I could look at multiple regions in a
> quick and automated way.  I tried to retrieve the data by copying and pasting
> the url and having a program retrieve the results, but apparently, the url
> does not contain a refseq parameter that allows me to fetch sequences
> dynamically.  Rather, it seems (I'm guessing) that the url contains a process
> id that contains the refseq somewhere locally on the UCSC site which prevents
> me from fetching sequences on the fly.  Is there any way for me to access your
> data in a more programmatic fashion?  Thank you.
>
> --David
>
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>



More information about the Genome mailing list