[Genome] Pfam foreign keys in data tables // mapping protein domains to genomic coordinates
Fan Hsu
fanhsu at soe.ucsc.edu
Fri Mar 2 10:41:44 PST 2007
Hi John,
Your desire had been expressed by others before.
One challenge is to get one representative protein
sequence for a domain. I found often we have
many candidates to represent a domain. Do you
have any suggestion on this?
In the mean time, you may want to consider
using the Superfamily track to see if
it is helpful.
Fan.
-----Original Message-----
From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On
Behalf Of John Major
Sent: Friday, March 02, 2007 10:11 AM
To: Ann Zweig
Cc: genome at soe.ucsc.edu
Subject: Re: [Genome] Pfam foreign keys in data tables // mapping
protein domains to genomic coordinates
Hi Ann-
Thank you for the prompt reply.
I can get the genomic positions of known genes, but what I really need
is the specific genomic coordinates of the domains *within* those genes.
I think the information you provided will only allow me to determine if
a specific gene contains a certain domain? What I need is more detail
still.
A hypothetical gene ABCD is one exon on chromosomeX from 100000 -> 101000.
The gene has a pfam Tyrosine kinase domain in the middle of it, which
maps to the genomic coordinates chrX 100250 -> 100550.
I'd like to be able to extract a table that looks like:
Chrm startPos endPos DomainNAME databaseNAME
chrX 100250 100550 Tyrosine-Kinase pFam
What I need to know is that the genomic position for the protein domain.
And in reality, I'd like to get the genomic positions for all of the
protein domains for interpro and pfam for hg17&18.
Thanks,
John
Ann Zweig wrote:
> Hello John,
>
> The missing link is the knownToPfam table in the hg18 (or
> whichever assembly you are working in) database. This table is the
> link between the knownGene table and the Pfam tables.
>
> $db.knownGene.name == knownToPfam.name
>
> knownToPfam:
> name value
> NM_001005484 PF00001
> BC024295 PF07647
>
> The $db.knownGene table has fields for chromosomal positions.
>
> You will find the domain type in the pfamDesc table:
> proteome.pfamDesc.description.
>
> The pfamXref table includes several types of ID values.
>
> pfamXref:
> pfamAC swissAC swissDisplayID
> PF00001 O00155 GPR25_HUMAN
> PF00001 O00254 PAR3_HUMAN
> PF00001 O00270 GPR31_HUMAN
>
> You should be able to put these pieces together to mine for the
> exact information you are looking for.
>
> If this is not enough detail for you, please do not hesitate to
> write back to the list and ask for more information.
>
> Regards,
>
> ----------
> Ann Zweig
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
> Please feel free to search the Genome mailing list archives by
> visiting our home page, clicking on "Contact Us", then typing a word
> or phrase into the search box. On that same page
> (http://genome.ucsc.edu/contacts.html), you can subscribe to the
> Genome mailing list.
>
>
> John Major wrote:
>
>> Hello-
>>
>> I am trying to map the pfam protein domains to genomic coordinates
>> and am having some problems.
>> I see that in the proteome tables, there are 2 obvious pfam tables:
>> pfamDesc and pfamXref.
>> Neither of these tables appear to be linked to other tables... or at
>> least the table description pages do not offer any information as to
>> which tables these 2 link to.
>> Also, I do not seem to see a table which gives the start and end
>> coordinates for the pfam doamins (in protein, mrna, or genomic space).
>>
>> What I would like to get is a simple table of domain information in
>> genomic coordinate space. Ie:
>> GenomeBuildID Chrm Start End
>> ProteinDomainName SourceDatabase
>> hg18 chr1 100000 100050
>> Protein-Kinase pFam
>> hg18 chr2 200010 200090
>> X-binding-site uniprot
>>
>>
>> I would like to get this info for both uniprot and pfam. The uniprot
>> tables (uniprot.feature and uniprot.description) appear to be linked
>> to kgXref via acc->spid. And I should be able to derive genomic
>> coordinates for the uniprot features via these tables.
>>
>>
>> If you have any advice on an easier way to get this mapping of
>> domains to genomic coordinates, I'd be thrilled to hear it.
>> Otherwise, could you please advise me on the pfam tables.
>>
>> Thanks!
>> John Major
>> _______________________________________________
>> Genome maillist - Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
>
_______________________________________________
Genome maillist - Genome at soe.ucsc.edu
http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list