[Genome] Questions on update, dumps and SNPs of hg18 data subsets ---knownGene*, refGene* and SNP

Fangcheng Gong Fangcheng.Gong at sial.com
Wed Feb 14 15:09:27 PST 2007


Hi Brooke,

Thank you very much for your help. It's quite useful.

Fangcheng



Brooke Rhead <rhead at soe.ucsc.edu> 
02/14/2007 04:45 PM

To
Fangcheng Gong <Fangcheng.Gong at sial.com>
cc
genome at soe.ucsc.edu
Subject
Re: [Genome] Questions on update, dumps and SNPs of hg18 data subsets 
---knownGene*, refGene* and SNP






Hi Fangcheng,

The refSeq mRNA sequences are stored in a file called "refMrna.fa.gz", 
which is located here (or by following the "Full data set" link rather 
than the "Annotation database" link on the downloads page):

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/

This file is automatically updated once a week with new data from 
GenBank.  (Note that this data and the RefSeq track are updated nightly 
in the Genome Browser, and the downloadable file is updated once per 
week, so it is possible for the download file to be slightly out of sync 
with the data in the Genome Browser).

I hope this information helps.  Please let us know if you have further 
questions.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


Fangcheng Gong wrote:
> 
> Hi Brooke,
> 
> Thank you for your suggestion, I'm actually looking for RefSeq sequences 

> that are used to create the exon coordinates in "refGene.txt".   The 
> files "mrnaRefseq.txt.gz" and "refSeqAli.txt.gz" don't contain the 
> sequence information.  the "knownGeneMrna.txt" contains sequence 
> information, but misses some sequences for accessions in "refGene.txt". 
>  Is it possible to dump the sequence information into a file along with 
> the exon coordinate file (refGene.txt)?  As I'm using the coordinate 
> files from UCSC site, it's better to use the sequence files from your 
> site so that the sequence data and coordinate data can be synchronized. 
>  Otherwise, I'm not sure whether two datasets (the coordinate data from 
> UCSC and the sequence data from NCBI) are synchronized as there is no 
> version information for accessions in your "knownGene.txt" and 
> "refGene.txt". 
> 
> Regards,
> Fangcheng
> 
> 
> *Brooke Rhead <rhead at soe.ucsc.edu>*
> 
> 02/14/2007 03:26 PM
> 
> 
> To
>                Fangcheng Gong <Fangcheng.Gong at sial.com>
> cc
>                genome at soe.ucsc.edu
> Subject
>                Re: [Genome] Questions on update, dumps and SNPs of hg18 
data subsets 
> ---knownGene*, refGene* and SNP
> 
> 
> 
> 
> 
> 
> 
> 
> Hi again Fangcheng,
> 
> One of our developers brought the 'refSeqAli' table to my attention.  It
> is the alignment table that corresponds to the refGene table.  Perhaps
> this is the table you were referring to in your second question.
> 
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
> 
> Brooke Rhead wrote:
>  > Hello Fangcheng,
>  >
>  > I see that Heather is addressing your question about SNP data.  I 
will
>  > address your other two questions.
>  >
>  > 1) The knownGene data does not change over time, unlike some of the
>  > other tables that are updated nightly, such as the GenBank-related
>  > tables.  The knownGene table is created once, when the track is 
created.
>  >   The data that is available on hgdownload.cse.ucsc.edu is the same 
as
>  > the data available in the Genome Browser.
>  >
>  > 2) We do not have a table called 'refGeneMrna'.  We do have a table
>  > called 'mrnaRefSeq'; could that be what you are trying to find?  It 
is
>  > available on hgdownload.cse.ucsc.edu.  If that is not the table you 
want
>  > to locate, please let us know, and we will try to direct you to the
>  > right information.
>  >
>  > --
>  > Brooke Rhead
>  > UCSC Genome Bioinformatics Group
>  >
>  >
>  >
>  > Fangcheng Gong wrote:
>  >> Dear UCSC colleagues,
>  >>
>  >> 1) The following data files haven't updated for long time.  Could 
you
>  >> update them at your earliest convenience?
>  >>
>  >> Directory: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
>  >>
>  >> knownGene.txt.gz        (the last update 04/13/2006)
>  >> knownGeneMrna.txt.gz    (the last update 04/13/2006)
>  >>
>  >> 2) Could you upload the dumps for the following files at the same
>  >> directory?
>  >>
>  >> refGeneMrna.sql
>  >> refGeneMrna.txt.gz
>  >> 
>  >> 3) Do you have transcript SNP data for knownGene and refGene?  If 
> yes, is
>  >> it possible to upload their dump at this FTP site?
>  >>
>  >>
>  >> Thank you very much for your time and effort.  I'm eager to hear 
your
>  >> reply.
>  >>
>  >> Regards
>  >> Fangcheng Gong
>  >>
>  >> _______________________________________
>  >> Fangcheng Gong, Ph.D.
>  >> Principal R&D Scientist, Bioinformatics
>  >>
>  >> Sigma-Aldrich Corporation
>  >> 2909 Laclede Ave.
>  >> St. Louis, MO 63103
>  >>
>  >> Phone:  314-289-8496 x 4464
>  >>         877-472-2192 x 4464
>  >> Fax:    314-286-7645
>  >> Email:  Fangcheng.Gong at sial.com
>  >> _______________________________________
>  >> _______________________________________________
>  >> Genome maillist  -  Genome at soe.ucsc.edu
>  >> http://www.soe.ucsc.edu/mailman/listinfo/genome
>  > _______________________________________________
>  > Genome maillist  -  Genome at soe.ucsc.edu
>  > http://www.soe.ucsc.edu/mailman/listinfo/genome
> 



More information about the Genome mailing list