[Genome] Pfam foreign keys in data tables // mapping protein domains to genomic coordinates

Ann Zweig ann at soe.ucsc.edu
Thu Mar 1 16:00:10 PST 2007


Hello John,

	The missing link is the knownToPfam table in the hg18 (or whichever 
assembly you are working in) database.  This table is the link  between 
the knownGene table and the Pfam tables.

$db.knownGene.name == knownToPfam.name

knownToPfam:
name		value
NM_001005484	PF00001
BC024295	PF07647

	The $db.knownGene table has fields for chromosomal positions.

	You will find the domain type in the pfamDesc table: 
proteome.pfamDesc.description.

	The pfamXref table includes several types of ID values.

pfamXref:
pfamAC	swissAC	swissDisplayID
PF00001	O00155	GPR25_HUMAN
PF00001	O00254	PAR3_HUMAN
PF00001	O00270	GPR31_HUMAN

	You should be able to put these pieces together to mine for the exact 
information you are looking for.

	If this is not enough detail for you, please do not hesitate to write 
back to the list and ask for more information.

Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


Please feel free to search the Genome mailing list archives by visiting 
our home page, clicking on "Contact Us", then typing a word or phrase 
into the search box.  On that same page 
(http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome 
mailing list.


John Major wrote:
> Hello-
> 
> I am trying to map the pfam protein domains to genomic coordinates and  
> am having some problems.
> I see that in the proteome tables, there are 2 obvious pfam tables: 
> pfamDesc and pfamXref.
> Neither of these tables appear to be linked to other tables... or at 
> least the table description pages do not offer any information as to 
> which tables these 2 link to.
> Also, I do not seem to see a table which gives the start and end 
> coordinates for the pfam doamins (in protein, mrna, or genomic space).
> 
> What I would like to get is a simple table of domain information in 
> genomic coordinate space. Ie:
> GenomeBuildID     Chrm   Start         End          ProteinDomainName   
> SourceDatabase
> hg18                         chr1    100000    100050        
> Protein-Kinase           pFam
> hg18                         chr2    200010    200090        
> X-binding-site            uniprot
> 
> 
> I would like to get this info for both uniprot and pfam.  The uniprot 
> tables (uniprot.feature and uniprot.description) appear to be linked to 
> kgXref via acc->spid.  And I should be able to derive genomic 
> coordinates for the uniprot features via these tables.
> 
> 
> If you have any advice on an easier way to get this mapping of domains 
> to genomic coordinates, I'd be thrilled to hear it.  Otherwise, could 
> you please advise me on the pfam tables.
> 
> Thanks!
> John Major
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list