[Genome] Pfam foreign keys in data tables // mapping protein domains to genomic coordinates
Ann Zweig
ann at soe.ucsc.edu
Thu Mar 1 16:00:10 PST 2007
Hello John,
The missing link is the knownToPfam table in the hg18 (or whichever
assembly you are working in) database. This table is the link between
the knownGene table and the Pfam tables.
$db.knownGene.name == knownToPfam.name
knownToPfam:
name value
NM_001005484 PF00001
BC024295 PF07647
The $db.knownGene table has fields for chromosomal positions.
You will find the domain type in the pfamDesc table:
proteome.pfamDesc.description.
The pfamXref table includes several types of ID values.
pfamXref:
pfamAC swissAC swissDisplayID
PF00001 O00155 GPR25_HUMAN
PF00001 O00254 PAR3_HUMAN
PF00001 O00270 GPR31_HUMAN
You should be able to put these pieces together to mine for the exact
information you are looking for.
If this is not enough detail for you, please do not hesitate to write
back to the list and ask for more information.
Regards,
----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Please feel free to search the Genome mailing list archives by visiting
our home page, clicking on "Contact Us", then typing a word or phrase
into the search box. On that same page
(http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome
mailing list.
John Major wrote:
> Hello-
>
> I am trying to map the pfam protein domains to genomic coordinates and
> am having some problems.
> I see that in the proteome tables, there are 2 obvious pfam tables:
> pfamDesc and pfamXref.
> Neither of these tables appear to be linked to other tables... or at
> least the table description pages do not offer any information as to
> which tables these 2 link to.
> Also, I do not seem to see a table which gives the start and end
> coordinates for the pfam doamins (in protein, mrna, or genomic space).
>
> What I would like to get is a simple table of domain information in
> genomic coordinate space. Ie:
> GenomeBuildID Chrm Start End ProteinDomainName
> SourceDatabase
> hg18 chr1 100000 100050
> Protein-Kinase pFam
> hg18 chr2 200010 200090
> X-binding-site uniprot
>
>
> I would like to get this info for both uniprot and pfam. The uniprot
> tables (uniprot.feature and uniprot.description) appear to be linked to
> kgXref via acc->spid. And I should be able to derive genomic
> coordinates for the uniprot features via these tables.
>
>
> If you have any advice on an easier way to get this mapping of domains
> to genomic coordinates, I'd be thrilled to hear it. Otherwise, could
> you please advise me on the pfam tables.
>
> Thanks!
> John Major
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list