[Genome] Inquiring about more efficient ways to access your data

Robert Kuhn kuhn at soe.ucsc.edu
Sat Dec 29 13:57:22 PST 2007


Siddarth,

Thanks for contacting us.

The most straightforward way to get the sequences of coding 
genes is via the Table Browser (see link in top blue bar).  
Assuming you are using the latest human assembly, the Table 
Browser defaults to the UCSC Genes track.  You may use other 
track, such as RefSeq as the basis for your download.

I suggest you retrieve the results one chrom at a time, by
selecting the "position" button and typing, e.g., "chrY" in
the adjacent textbox.

then:

output format:  sequence
get output

select genome, submit

on the next page you may select Retrieval Region Options
to suit your needs.

the genes are on both strands.  see the table schema for 
the knownGenes table for details.

If you wish to restrict your search to protein-coding genes,
use the filter button before you perform the sequence retrieval.
Link to the kgTxInfo table and set "category"to "coding".

If this is not enough information, please feel free to contact
the list again.

best wishes,

			--b0b kuhn
			ucsc genome bioinformatics group


> From genome-bounces at soe.ucsc.edu  Sat Dec 29 11:00:02 2007
> To: genome at cse.ucsc.edu
> Subject: [Genome] Inquiring about more efficient ways to access your data
> 
> 
> Dear Sir/Madam:
> 
> I am Siddarth Gautham, a graduate student in Arizona State University. I
> apologize for causing a high traffic in your network. I need your help in
> learning more efficient ways of accessing the UCSC data.
> 
> My research interest is collecting all the coding sequences in the human
> genome (which you mention in CAPITALIZED LETTERS in ucsc genome browser)
> with their homologs.
> 
> How I proceeded with the project:
> I used your chromFaMasked.zip folder to get each chromosome file. I found
> the positions of Non-repetitive regions in each chromosome and developed
> query strings to interact with the UCSC genome browser. (See Attachments to
> get an example of query strings of the chromosome Y).
> 
> Since the non-repetitive query strings are huge in number for each
> chromosome, around 0.3 milllion for each chromosome, it causes a heavy
> traffic to you. Using these positions of Non-repetitive regions, I need to
> develop query strings of coding regions (exons). This is stage which I am in
> right now.
> 
> What is the most effective way to get the coding regions in the
> non-repetitve regions of human genome?? Also since UCSC genome browser shows
> only the coding regions in the forward (5` -  3` strand) and the genes could
> be present in any strand, how to identify whether the coding region
> represents a gene in forward or reverse direction? and how to allign the
> homologs accordingly?
> 
> Thanks
> siddarth
> 


More information about the Genome mailing list