[Genome] Entrez gene ID to refseq ID mapping
Guoliang Xing
gxing at soe.ucsc.edu
Wed May 23 16:43:28 PDT 2007
Hi Rachel,
Thank you for your reply. This is what I want and it's very helpful.
I am a little bit surprised to find that from the same human Mar 2006
Genes and Gene Prediction Tracks group -> RefSeq Genes -> refFlat table
returns 25407 rows, but reflink option returns 203390 rows, almost 10
times more.
On the other hand, the gene2refseq.gz file I downloaded from
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ , after filtering to keep human only
genes with Taxonomy ID of 9606, it produces 145360 rows (many of the gene
IDs are duplicates here).
I guess the genes downloaded from UCSC table browser using the refFlat
option is a more reasonable snapshot of all coding genes on HG18.
My key task is to figure out the genome coordinates of each gene and its
exons, then link this geneID to existing pathways, and/or SNPs associated
with it.
If my guess is not good, please let me know. I understand that
gene annotation is not simple, there will be some inconsistences.
Thanks,
Guoliang
On Wed, 23 May 2007, Rachel Harte wrote:
> Hello Guoliang,
>
> There is a table called refLink. You can download through the Table
> Browser (click on the "Tables" link on the top blue menu bar). Select the
> assembly of interest and the "Genes and Gene Predictions" group and the
> "RefSeq Genes" track. Then you can select the refLink table. The Entrez
> Gene ID is in the locuslinkId column.
> Alternatively, go to the Downloads server:
> http://hgdownload.cse.ucsc.edu/downloads.html
>
> Once you have found the organism and assembly of interest, then click on
> the "Annotation database" link and there you can download the contents of
> any table in the database for that assembly.
>
> I hope that this helps you. For a rapid response in the future, please
> direct questions to our mailing list at: genome at soe.ucsc.edu
>
> Thanks.
>
> Rachel
>
>
> Rachel Harte UCSC Genome
> Bioinformatics Group http://genome.ucsc.edu
>
>
> On Tue, 22 May 2007, Guoliang Xing wrote:
>
> > Hi Rachel,
> >
> > Is there a way to download a mapping table from UCSC between Entrez
> > GeneID to Refseq ID?
> >
> > I used UCSC table browser, RefSeq Genes track, knownToRefSeq table, and
> > downloaded the data, it has a name field which is the transcript accession
> > name, and a gene name field. But not GeneID.
> >
> > I like all the info this table provides, but I also need the Entrez
> > GeneID (for Human).
> >
> > One the other hand, I parsed NCBI's gene2refseq table for human, but it
> > has too many redundant lines for the same geneID.
> >
> >
> >
> > You help will be appreciated.
> >
> > Guoliang
> >
>
More information about the Genome
mailing list