[Genome] Unique canonical sequence for each HGNC gene symbol
Amir Karger
akarger at CGR.Harvard.edu
Wed Jun 27 12:55:33 PDT 2007
> From: Kayla Smith [mailto:kayla at soe.ucsc.edu]
>
> I've asked one of our developers about your question and his
> advice to
> you is to write some code as follows:
>
> For each HGNC gene symbol do
> /* select symbol from proteome.hgncXref */
> {
> query mySQL against kgXref and knownCanonical tables to
> get ALL KG ID
> that has the same kgID with the geneSymbol equal to the
> HGNC gene symbol
>
> /* select kgID from kgXref, knownCanonical where
> transcript=kgID and
> geneSymbol=HGNC_SYMBOL; */
> go through the various filtering he listed and then
> keep the top one
> that passed the filtering.
>
> get exon info from knownGene table of this kgID.
> }
OK, so basically the answer is to arbitrarily pick something, which is
what I was going to do anyway.
By the way, For sets of canonical genes that had the same HGNC ID, I
compared the chromosome locations of the genes with the "Core data" file
from HGNC. Below, I list the genes that are annotated in UCSC as having
different chromosome locations than HGNC has for that name. For example,
UCSC says DUX4 appears on chr4 and chr10, but HGNC says it's on chr4.
I didn't know whether you were using these gene names to say "This is
the closest homolog, even though it's on the wrong chromosome." or
whether you just didn't check the chromosome number. Perhaps in that
case you would want to add this to the knownCanonical pipeline.
uc002hhh.1 chr17 ne HGNC's ZNF207 chr6
uc003bav.1 chr22 ne HGNC's NHP2L1 chr12
uc002pey.1 chr19 ne HGNC's GNG8 chr1
uc002cfg.1 chr16 ne HGNC's FAM39DP chr15
uc001lns.1 chr10 ne HGNC's DUX4 chr4
uc001yko.1 chr14 ne HGNC's PPP2R5C chr3
uc001ldw.1 chr10 ne HGNC's FAM45B chrX
uc002lpw.1 chr19 ne HGNC's PRG2 chr11
uc003zfu.1 chr9 ne HGNC's FAM39B chr2
uc001wtp.1 chr14 ne HGNC's DPPA3 chr12
uc004ftz.1 chrY ne HGNC's CD24 chr6
uc003ysf.1 chr8 ne HGNC's POU5F1 chr6
uc001lkm.1 chr10 ne HGNC's TXNL2 chr6
uc003bzp.1 chr3 ne HGNC's SH3BP5 chr1
uc002twf.1 chr2 ne HGNC's PABPCP2 chr14
uc001bof.1 chr1 ne HGNC's WASF2 chrX
uc003meh.1 chr5 ne HGNC's CLTB chr4
uc001qqq.1 chr12 ne HGNC's PTMS chr17
uc001fvw.1 chr1 ne HGNC's SUMO1 chr2
uc001mjp.1 chr11 ne HGNC's CSNK2A1 chr20
uc002rqt.1 chr2 ne HGNC's RPLP0 chr12
uc003mnx.1 chr5 ne HGNC's OR4F16 chr1
uc004cch.1 chr9 ne HGNC's EEF1A1 chr6
uc003dep.1 chr3 ne HGNC's SNHG8 chr4
uc002mst.1 chr19 ne HGNC's ZNF69 chr22
uc002hnh.1 chr17 ne HGNC's LHX1 chr11
uc001jfa.1 chr10 ne HGNC's GDF2 chr11
uc001hpw.1 chr1 ne HGNC's H3F3B chr17
uc002yiv.1 chr21 ne HGNC's BAGE chr13
uc003vrg.1 chr7 ne HGNC's EEF1G chr11
uc002dsa.1 chr16 ne HGNC's SPIN1 chr9
uc001wzx.1 chr14 ne HGNC's PSMC6 chr12
uc003tgn.1 chr7 ne HGNC's CD3G chr11
uc004eqf.1 chrX ne HGNC's SUMO2 chr17
uc001ryg.1 chr12 ne HGNC's BIN2 chr4
uc001gyu.1 chr1 ne HGNC's TMEM183B chr3
uc003zzc.1 chr9 ne HGNC's CLTA chr12
uc002loc.1 chr19 ne HGNC's OR4F17 chr16
uc001vuz.1 chr14 ne HGNC's ACTBL1 chr22
uc001vwc.1 chr14 ne HGNC's ACTBL1 chr22
uc002zdj.1 chr21 ne HGNC's HIST1H2BK chr6
uc004com.1 chr9 ne HGNC's TUBB4Q chr4
uc001xgd.1 chr14 ne HGNC's PPP2R5E chr7
uc003ird.1 chr4 ne HGNC's GK chrX
-Amir
More information about the Genome
mailing list