[Genome] questions for knowngene to refseq
mirrian@iis.sinica.edu.tw
mirrian at iis.sinica.edu.tw
Wed Nov 1 02:40:28 PST 2006
Dear Sir,
I am trying to link these two kinds of data, refseq download from
NCBI and known gene from UCSC. I have downloaded these two tables,
kgXref and knownToRefSeq. I found that these two tables are
different, but both contain the knowngene info and refseq info. For
example, kgXref contains 32750 records that related to refseq from
NCBI homo build 36.1, whie knownToRefSeq contains 33961 records that
related to refseq from NCBI homo build 36.1. I'm wondering which one
is more accurate than the other and what causes this difference.
Furthermore, from the info in "The UCSC Known Genes" published in
Feb.24, 2006, Genome analysis, if an mRNA has multiple proteins,
choose the best from the order, PDB,Swiss-Port, TrEMBL. And for the
other hand, if one protein has multiple mRNA, choose the best in favor
of longer and newer one with less mismatches. Does that means Known
Genes DB only contained the one to one relationship between protein
and mRNA? However, looking at the knownToRefSeq table, the
relationship between known gene and refseq is not one to one. About
22747 records shows that one refseq has multiple known genes. Would
you mind to tell me what causes this?
Thanks for your help, and look forward to your response.
Best Regards,
MengRu
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
More information about the Genome
mailing list