[Genome] [Genome-announce] New UCSC gene prediction set released

Steve Chervitz Steve_Chervitz at affymetrix.com
Mon Apr 16 14:22:06 PDT 2007


Rachel,

Thanks for the info. A couple of follow up questions for you:

Are there any plans to contribute UCSC gene sequences to NCBI or another
repository, or will they always be maintained by UCSC?

Also, how frequently will they be released/updated?

Steve

> From: Rachel Harte <hartera at soe.ucsc.edu>
> Date: Thu, 12 Apr 2007 11:47:07 -0700 (PDT)
> To: Steve Chervitz <Steve_Chervitz at affymetrix.com>
> Cc: <genome at soe.ucsc.edu>
> Subject: Re: [Genome] [Genome-announce] New UCSC gene prediction set released
> 
> Hello Steve,
> 
> Thank you for your compliment. In answer to your question, the level of
> RefSeq support was not taken into account when creating the UCSC Genes
> set except that we do not use the predicted RefSeqs whose accessions are
> like XM_xxxxxx. However, we do colour the UCSC Genes in the Browser
> according to level of support which does include the RefSeq support level:
> 
> Here is the colour-coding for the track:
>     * Black -- feature has a corresponding entry in the Protein Databank
> (PDB)
>     * Dark blue -- transcript has been reviewed or validated by either the
> RefSeq or SwissProt staff
>     * Medium blue -- other RefSeq transcripts
>     * Light blue -- non-RefSeq transcripts
> 
> If you turn on the RefSeq track, the RefSeqs are also coloured according
> to their support levels. From the track description:
> 
> "The color shading indicates the level of review the RefSeq record has
> undergone: predicted (light), provisional (medium), reviewed (dark)."
> 
> This information is found on the description pages which describe the
> methods used to create the tracks. To reach the description, click on the
> blue/gray mini-button at the left side of the track in the Browser image
> or click on the link above the appropriate track control found below the
> Browser image.
> 
> I hope that this helps you. Please let us know if you have further
> questions.
> 
> Rachel
> 
> Rachel Harte
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
> 
> 
> On Wed, 11 Apr 2007, Steve Chervitz wrote:
> 
>> Kudos to the UCSC team on what looks to be a nice new resource.
>> 
>> I have a question regarding the RefSeq evidence used for support. You say
>> that an entry with a RefSeq supporting sequence requires no additional
>> support, but there is a range of different support levels among RefSeq
>> records and I was wondering if your pipeline incorporated them all or
>> whether the pipeline weighted RefSeq records with different status.
>> 
>> The different support levels within RefSeq I'm referring to are: reviewed,
>> validated, provisional, predicted, inferred, model, etc. described here:
>> http://www.ncbi.nlm.nih.gov/RefSeq/key.html#status
>> 
>> Cheers,
>> Steve
>> 
>> 
>>> From: Donna Karolchik <donnak at soe.ucsc.edu>
>>> Date: Fri, 6 Apr 2007 17:14:43 -0700
>>> To: Genome-announce <genome-announce at soe.ucsc.edu>, genome-mirror
>>> <genome-mirror at soe.ucsc.edu>
>>> Subject: [Genome-announce] New UCSC gene prediction set released
>>> 
>>> We are pleased to announce the release of a new gene prediction set, UCSC
>>> Genes,
>>> on the latest human Genome Browser (hg18, NCBI Build 36). This annotation,
>>> which
>>> includes putative non-coding genes as well as protein-coding genes and 99.9%
>>> of
>>> RefSeq genes, is the next generation of the Known Genes set that UCSC has
>>> been
>>> providing for several years and supersedes the existing Known Genes
>>> annotation
>>> on the hg18 assembly.
>>> 
>>> The UCSC Genes is a moderately conservative prediction set based on data
>>> from
>>> RefSeq, GenBank, and UniProt. Each entry requires the support of one GenBank
>>> RNA
>>> sequence plus at least one additional line of evidence, with the exception
>>> of
>>> RefSeq RNAs, which require no additional evidence. Some of the non-coding
>>> transcripts in the set may actually code for protein, but the evidence for
>>> the
>>> associated protein is weak at best. Compared to RefSeq, this gene set
>>> generally
>>> has about 10% more protein-coding genes, approximately five times as many
>>> putative non-coding genes, and about twice as many splice variants.
>>> 
>>> The UCSC Genes set is produced using a computational pipeline developed at
>>> UCSC
>>> by Jim Kent, Chuck Sugnet and Mark Diekhans. For detailed information about
>>> the
>>> process used to construct the genes set, see the track description page. In
>>> upcoming months, we plan to release UCSC Genes sets on other organisms in
>>> addition to human.
>>> 
>>> As part of this change, we are now using our own UCSC Genes accession
>>> numbers
>>> as
>>> the primary key into the underlying knownGene table, rather than the GenBank
>>> mRNA accessions we used in the previous Known Genes prediction set. Note
>>> that
>>> this may affect external sites with URLs that link into our genes track
>>> using
>>> the older-style accessions.
>>> 
>>> We will continue to provide the older Known Genes track on hg18 under the
>>> name
>>> "Old Known Genes". You may find the following tables useful in referencing
>>> the
>>> older gene set and converting between the two sets:
>>> 
>>> - knownGeneOld2: new name for table underlying the old Known Genes
>>> (previously
>>> called knownGene)
>>> 
>>> - kgXrefOld2: new name for table that contains data for converting old Known
>>> Genes IDs to other IDs (previously called kgXref)
>>> 
>>> - kg2ToKg3: data for converting old Known Genes IDs to the newer UCSC Genes
>>> IDs
>>> 
>>> We'd like to acknowledge the many people affiliated with the UCSC Genome
>>> Bioinformatics group who worked hard to release this new annotation:
>>> developers
>>> Jim Kent, Mark Diekhans, and Fan Hsu (with technical support from several
>>> other
>>> engineers in the group); David Haussler; our splendid QA team -- Archana
>>> Thakkapallayil, Ann Zweig, Robert Kuhn, Kayla Smith, and Brooke Rhead; our
>>> build
>>> engineer -- Andy Pohl; and our sysadmin group. We'd also like to thank Chuck
>>> Sugnet for his input, the people and organizations maintaining the RefSeq,
>>> UniProt, and GenBank databases, and the scientists worldwide who have
>>> contributed to them. If you have any questions about this new release, feel
>>> free
>>> to contact us at genome at soe.ucsc.edu (general questions) or
>>> genome-mirror at soe.ucsc.edu (mirror-specific questions).
>>> 
>>> -Donna
>>> -----------------------------------
>>> Donna Karolchik
>>> UCSC Genome Bioinformatics Group
>>> http://genome.ucsc.edu
>>> 
>>> _______________________________________________
>>> Genome-announce mailing list
>>> Genome-announce at soe.ucsc.edu
>>> http://www.soe.ucsc.edu/mailman/listinfo/genome-announce
>> 
>> _______________________________________________
>> Genome maillist  -  Genome at soe.ucsc.edu
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>> 



More information about the Genome mailing list