[Genome] knownToPfam, DNA-coordinates of domains
Anton Kratz
anton.kratz at googlemail.com
Fri Jan 11 02:11:24 PST 2008
Dear UCSC team,
Context of my question: I am trying to get protein domain coordinates in DNA
space for the domains listed in the knownToPfam table.
Basically what my program does is for each name-value-pair of the
knownToPfam table, it looks up (in the knownGenes table) the protein encoded
by that isoform, then it looks up (in Pfam-A.full, a flatfile with the
entire Pfam database) that domain's sequence and aligns that sequence back
onto the human genome, using BLAT (locally) in translated mode.
My problem is that for 13,636 of the 35,789 name-value-pairs in the
knownToPfam table, I do not find the protein coded for by the respective
isoform when searching for the protein in Pfam-A.full.
Example: according to knownToPfam, NM_015658 contains the domain PF03715.
According to the knownGene table, NM_015658 encodes the protein YU20_HUMAN.
So I am looking for YU20_HUMAN under the entry for PF03715 in Pfam-A.full.
But it's not there and thus not part of the multiple alignment.
I would be very thankful if you have an idea what's going wrong in this
approach.
regards,
Anton
P.S.: Everything I am using is for hg17 because I have other mappings only
available for hg17 which are not available for hg18.
More information about the Genome
mailing list