[Genome] gene ontology information from UCSC table browser
Ann Zweig
ann at soe.ucsc.edu
Wed Dec 19 10:11:42 PST 2007
Hello Les,
You can create the table you are looking for by using the Table Browser
tool on our website ('Tables' from the top blue navigation bar). You
will be joining several tables together to get the information you want.
You may want to read the following section of the Table Browser User's
Guide before you get started:
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#SelectedFields
After you set the output format to 'selected fields from primary and
related tables', you will need to know about the following relationships
between the databases and tables you will be joining.
Databases:
hg18 = the database that contains tables for the latest human assembly
go = the database that contains all of the Gene Ontology tables
Tables:
hg18.kgXref.refseq = the field that is part of the UCSC Genes track,
that contains the refSeq identifiers you are looking for.
Relationships between database tables:
hg18.kgXref.spDisplayID <--> go.goaPart.dbObjectSymbol
go.goaPart.goId <--> go.term.acc
The term_type field of the go.term database table contains information
about what type of GO term this is. In your case, I imagine you are
most intersted in the following terms:
molecular_function
cellular_component
biological_process
When I used the Table Browser tool to find the GO annotations for one
refSeq Gene (NM_000808), I got the following output:
hg18.kgXref.refseq
go.term.name
go.term.term_type
go.term.acc
NM_000808
GABA-A receptor activity,ion channel activity,extracellular ligand-gated
ion channel activity,chloride channel activity,integral to plasma
membrane,transport,ion transport,chloride transport,gamma-aminobutyric
acid signaling pathway,benzodiazepine receptor
activity,membrane,integral to membrane,neurotransmitter receptor
activity,chloride ion binding,postsynaptic membrane,
molecular_function,molecular_function,molecular_function,molecular_function,cellular_component,biological_process,biological_process,biological_process,biological_process,molecular_function,cellular_component,cellular_component,molecular_function,molecular_function,cellular_component,
GO:0004890,GO:0005216,GO:0005230,GO:0005254,GO:0005887,GO:0006810,GO:0006811,GO:0006821,GO:0007214,GO:0008503,GO:0016020,GO:0016021,GO:0030594,GO:0031404,GO:0045211,
This should be enough to give you a good start. If you need more or
less data, you can choose different fields in the appropriate database
tables. I hope this information is helpful to you. Please don't
hesitate to contact the mail list again if you require further assistance.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Please feel free to search the Genome mailing list archives by visiting
our home page, clicking on "Contact Us", then typing a word or phrase
into the search box. On that same page
(http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome
mailing list.
Les Ander wrote:
> Hi,
>
> I want to get the gene ontology data for all human refseq genes. Ensemble
> has it for
> uniprot ids but I want to get it for refseq genes. UCSC browser seems to
> have
> what I want at
> http://hgdownload.cse.ucsc.edu/goldenPath/go/database/association.txt.gz
> however, I don't understand the format.
> I would be very grateful if you can tell me how i can go about
> getting a two column file with refseq gene name and go annotation (for each
> of component,
> function and process).
> thanks
> les
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list