[Genome] gene ontology information from UCSC table browser

Ann Zweig ann at soe.ucsc.edu
Wed Dec 19 10:11:42 PST 2007


Hello Les,

	You can create the table you are looking for by using the Table Browser 
tool on our website ('Tables' from the top blue navigation bar).  You 
will be joining several tables together to get the information you want. 
  You may want to read the following section of the Table Browser User's 
Guide before you get started:

http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#SelectedFields

	After you set the output format to 'selected fields from primary and 
related tables', you will need to know about the following relationships 
between the databases and tables you will be joining.

Databases:
hg18 = the database that contains tables for the latest human assembly
go = the database that contains all of the Gene Ontology tables

Tables:
hg18.kgXref.refseq = the field that is part of the UCSC Genes track, 
that contains the refSeq identifiers you are looking for.

Relationships between database tables:
hg18.kgXref.spDisplayID <--> go.goaPart.dbObjectSymbol
go.goaPart.goId <--> go.term.acc


	The term_type field of the go.term database table contains information 
about what type of GO term this is.  In your case, I imagine you are 
most intersted in the following terms:

molecular_function
cellular_component
biological_process


	When I used the Table Browser tool to find the GO annotations for one 
refSeq Gene (NM_000808), I got the following output:

hg18.kgXref.refseq
go.term.name
go.term.term_type
go.term.acc

NM_000808
GABA-A receptor activity,ion channel activity,extracellular ligand-gated 
ion channel activity,chloride channel activity,integral to plasma 
membrane,transport,ion transport,chloride transport,gamma-aminobutyric 
acid signaling pathway,benzodiazepine receptor 
activity,membrane,integral to membrane,neurotransmitter receptor 
activity,chloride ion binding,postsynaptic membrane,	
molecular_function,molecular_function,molecular_function,molecular_function,cellular_component,biological_process,biological_process,biological_process,biological_process,molecular_function,cellular_component,cellular_component,molecular_function,molecular_function,cellular_component, 

GO:0004890,GO:0005216,GO:0005230,GO:0005254,GO:0005887,GO:0006810,GO:0006811,GO:0006821,GO:0007214,GO:0008503,GO:0016020,GO:0016021,GO:0030594,GO:0031404,GO:0045211,

	This should be enough to give you a good start.  If you need more or 
less data, you can choose different fields in the appropriate database 
tables.  I hope this information is helpful to you.  Please don't 
hesitate to contact the mail list again if you require further assistance.

Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu

Please feel free to search the Genome mailing list archives by visiting 
our home page, clicking on "Contact Us", then typing a word or phrase 
into the search box.  On that same page 
(http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome 
mailing list.



Les Ander wrote:
> Hi,
> 
> I want to get the gene ontology data for all human refseq genes. Ensemble
> has it for
> uniprot ids but I want to get it for refseq genes. UCSC browser seems to
> have
> what I want at
> http://hgdownload.cse.ucsc.edu/goldenPath/go/database/association.txt.gz
> however, I don't understand the format.
> I would be very grateful if you can tell me how i can go about
> getting a two column file with refseq gene name and go annotation (for each
> of component,
> function and process).
> thanks
> les
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list