[Genome] Questions on update, dumps and SNPs of hg18 data subsets ---knownGene*, refGene* and SNP

Fangcheng Gong Fangcheng.Gong at sial.com
Wed Feb 14 13:51:52 PST 2007


Hi Brooke,

Thank you for your suggestion, I'm actually looking for RefSeq sequences 
that are used to create the exon coordinates in "refGene.txt".   The files 
"mrnaRefseq.txt.gz" and "refSeqAli.txt.gz" don't contain the sequence 
information.  the "knownGeneMrna.txt" contains sequence information, but 
misses some sequences for accessions in "refGene.txt".  Is it possible to 
dump the sequence information into a file along with the exon coordinate 
file (refGene.txt)?  As I'm using the coordinate files from UCSC site, 
it's better to use the sequence files from your site so that the sequence 
data and coordinate data can be synchronized.  Otherwise, I'm not sure 
whether two datasets (the coordinate data from UCSC and the sequence data 
from NCBI) are synchronized as there is no version information for 
accessions in your "knownGene.txt" and "refGene.txt". 

Regards,
Fangcheng



Brooke Rhead <rhead at soe.ucsc.edu> 
02/14/2007 03:26 PM

To
Fangcheng Gong <Fangcheng.Gong at sial.com>
cc
genome at soe.ucsc.edu
Subject
Re: [Genome] Questions on update, dumps and SNPs of hg18 data subsets 
---knownGene*, refGene* and SNP






Hi again Fangcheng,

One of our developers brought the 'refSeqAli' table to my attention.  It 
is the alignment table that corresponds to the refGene table.  Perhaps 
this is the table you were referring to in your second question.

--
Brooke Rhead
UCSC Genome Bioinformatics Group

Brooke Rhead wrote:
> Hello Fangcheng,
> 
> I see that Heather is addressing your question about SNP data.  I will 
> address your other two questions.
> 
> 1) The knownGene data does not change over time, unlike some of the 
> other tables that are updated nightly, such as the GenBank-related 
> tables.  The knownGene table is created once, when the track is created. 

>   The data that is available on hgdownload.cse.ucsc.edu is the same as 
> the data available in the Genome Browser.
> 
> 2) We do not have a table called 'refGeneMrna'.  We do have a table 
> called 'mrnaRefSeq'; could that be what you are trying to find?  It is 
> available on hgdownload.cse.ucsc.edu.  If that is not the table you want 

> to locate, please let us know, and we will try to direct you to the 
> right information.
> 
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
> 
> 
> 
> Fangcheng Gong wrote:
>> Dear UCSC colleagues,
>>
>> 1) The following data files haven't updated for long time.  Could you 
>> update them at your earliest convenience?
>>
>> Directory: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
>>
>> knownGene.txt.gz        (the last update 04/13/2006)
>> knownGeneMrna.txt.gz    (the last update 04/13/2006)
>>
>> 2) Could you upload the dumps for the following files at the same 
>> directory?
>>
>> refGeneMrna.sql
>> refGeneMrna.txt.gz
>> 
>> 3) Do you have transcript SNP data for knownGene and refGene?  If yes, 
is 
>> it possible to upload their dump at this FTP site?
>>
>>
>> Thank you very much for your time and effort.  I'm eager to hear your 
>> reply.
>>
>> Regards
>> Fangcheng Gong
>>
>> _______________________________________
>> Fangcheng Gong, Ph.D.
>> Principal R&D Scientist, Bioinformatics
>>
>> Sigma-Aldrich Corporation
>> 2909 Laclede Ave.
>> St. Louis, MO 63103
>>
>> Phone:  314-289-8496 x 4464
>>         877-472-2192 x 4464
>> Fax:    314-286-7645
>> Email:  Fangcheng.Gong at sial.com
>> _______________________________________
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome



More information about the Genome mailing list