[Genome] Pfam foreign keys in data tables // mapping protein domains to genomic coordinates

Fan Hsu fanhsu at soe.ucsc.edu
Fri Mar 2 10:41:44 PST 2007


Hi John,

Your desire had been expressed by others before.

One challenge is to get one representative protein
sequence for a domain.  I found often we have 
many candidates to represent a domain.  Do you 
have any suggestion on this?

In the mean time, you may want to consider 
using the Superfamily track to see if 
it is helpful. 

Fan.
-----Original Message-----
From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On
Behalf Of John Major
Sent: Friday, March 02, 2007 10:11 AM
To: Ann Zweig
Cc: genome at soe.ucsc.edu
Subject: Re: [Genome] Pfam foreign keys in data tables // mapping
protein domains to genomic coordinates


Hi Ann-

Thank you for the prompt reply.

I can get the genomic positions of known genes, but what I really need 
is the specific genomic coordinates of the domains *within* those genes. 
I think the information you provided will only allow me to determine if 
a specific gene contains a certain domain?  What I need is more detail 
still.

A hypothetical gene ABCD is one exon on chromosomeX from 100000 -> 101000. 
The gene has a pfam Tyrosine kinase domain in the middle of it, which 
maps to the genomic coordinates chrX 100250 -> 100550.
I'd like to be able to extract a table that looks like:
Chrm   startPos  endPos    DomainNAME     databaseNAME
chrX    100250   100550     Tyrosine-Kinase   pFam


What I need to know is that the genomic position for the protein domain. 
And in reality, I'd like to get the genomic positions for all of the 
protein domains for interpro and pfam for hg17&18.


Thanks,
John




Ann Zweig wrote:

> Hello John,
>
>     The missing link is the knownToPfam table in the hg18 (or 
> whichever assembly you are working in) database.  This table is the 
> link  between the knownGene table and the Pfam tables.
>
> $db.knownGene.name == knownToPfam.name
>
> knownToPfam:
> name        value
> NM_001005484    PF00001
> BC024295    PF07647
>
>     The $db.knownGene table has fields for chromosomal positions.
>
>     You will find the domain type in the pfamDesc table: 
> proteome.pfamDesc.description.
>
>     The pfamXref table includes several types of ID values.
>
> pfamXref:
> pfamAC    swissAC    swissDisplayID
> PF00001    O00155    GPR25_HUMAN
> PF00001    O00254    PAR3_HUMAN
> PF00001    O00270    GPR31_HUMAN
>
>     You should be able to put these pieces together to mine for the 
> exact information you are looking for.
>
>     If this is not enough detail for you, please do not hesitate to 
> write back to the list and ask for more information.
>
> Regards,
>
> ----------
> Ann Zweig
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
> Please feel free to search the Genome mailing list archives by 
> visiting our home page, clicking on "Contact Us", then typing a word 
> or phrase into the search box.  On that same page 
> (http://genome.ucsc.edu/contacts.html), you can subscribe to the 
> Genome mailing list.
>
>
> John Major wrote:
>
>> Hello-
>>
>> I am trying to map the pfam protein domains to genomic coordinates 
>> and  am having some problems.
>> I see that in the proteome tables, there are 2 obvious pfam tables: 
>> pfamDesc and pfamXref.
>> Neither of these tables appear to be linked to other tables... or at 
>> least the table description pages do not offer any information as to 
>> which tables these 2 link to.
>> Also, I do not seem to see a table which gives the start and end 
>> coordinates for the pfam doamins (in protein, mrna, or genomic space).
>>
>> What I would like to get is a simple table of domain information in 
>> genomic coordinate space. Ie:
>> GenomeBuildID     Chrm   Start         End          
>> ProteinDomainName   SourceDatabase
>> hg18                         chr1    100000    100050        
>> Protein-Kinase           pFam
>> hg18                         chr2    200010    200090        
>> X-binding-site            uniprot
>>
>>
>> I would like to get this info for both uniprot and pfam.  The uniprot 
>> tables (uniprot.feature and uniprot.description) appear to be linked 
>> to kgXref via acc->spid.  And I should be able to derive genomic 
>> coordinates for the uniprot features via these tables.
>>
>>
>> If you have any advice on an easier way to get this mapping of 
>> domains to genomic coordinates, I'd be thrilled to hear it.  
>> Otherwise, could you please advise me on the pfam tables.
>>
>> Thanks!
>> John Major
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
>

_______________________________________________
Genome maillist  -  Genome at soe.ucsc.edu
http://www.soe.ucsc.edu/mailman/listinfo/genome



More information about the Genome mailing list