[Genome] Batch retrieval using gene symbols

Archana Thakkapallayil archanat at soe.ucsc.edu
Fri Dec 8 09:29:50 PST 2006


Hello Jacob,

The geneSymbol field is located in the kgXref table under the Known 
Genes track. Unfortunately, you cannot paste/upload gene symbols as 
identifiers to search the kgXref table. You can only search for 
identifiers in the primary field of the table which in this case is the 
kgID (the accession). Also, you can't get the sequence information 
directly using this table.

However, you could get the information that you are looking for by 
retrieving information from the kgXref and the knownGene tables using 
the Table Browser. This is going to be a two-step process. First, you 
need to find out the known gene ID's corresponding to your gene symbol's 
using the kgXref table and then use your list of known gene ID's to 
extract the promotor/upstream regions that you are interested, using the 
knownGene table.

To get to it, make the following selections in the Table Browser:

clade: vertebrate
genome: human
assembly: Mar. 2006
group: Genes and Gene Prediction Tracks
track: Known Genes
table: kgXref
click on "filter: create" button and then paste a white-space separated 
list of your gene symbols into the geneSymbol text box and then click 
"submit"

Then choose "output format: selected fields from primary and related 
tables" and hit "get output"

On the Select Fields page, check the kgID field from the kgXref table 
and then hit "get output".

This gives you the list of known gene ID's corresponding to your gene 
symbol's.

Now back on the Table Browser page, choose "table: knownGene" and 
"region:genome". Then paste your list of known gene ID's in the 
paste/upload list box and hit "submit" button.

Then choose "output format: sequence" and hit "get output".

Select "genomic" and hit "submit".

Under "Sequence Retrieval Region Options" check only the box 
"Promotor/Upstream by -- bases". You can specify the number of bases you 
are interested in the text box here and then hit "get sequence".

In case,if you are looking for the actual promotor regions for your 
genes, we do have a couple of tracks on h18 that may help: the 'FirstEF' 
track, the TFBS Conserved track and the 'Reg Potential 7 species' track. 
The FirstEf track predicts exon, promoter and CpG window. The 'Reg 
Potential 7 species' track predicts regulatory regions. The TFBS track 
contains the location and score of transcription factor binding sites 
conserved in the human/mouse/rat alignment.

I hope that this helps you. Please let us know if you have further 
questions.

Regards,

Archana
UCSC Genome Bioinformatics Group


Jacob Brown wrote:
> Hello,
>
> I am just trying to retrieve promoter regions for ~300 genes using  
> the table browser.  I am pasting a list of gene symbols in the  
> identifier box and using the known genes track.  When I ask for  
> sequence in output it does not find any hits.  Suggestions?
>
> Many thanks,
>
> Jacob Brown
> National Eye Institute
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   



More information about the Genome mailing list