[Genome] [Genome-announce] New UCSC gene prediction set released
Steve Chervitz
Steve_Chervitz at affymetrix.com
Wed Apr 11 18:05:44 PDT 2007
Kudos to the UCSC team on what looks to be a nice new resource.
I have a question regarding the RefSeq evidence used for support. You say
that an entry with a RefSeq supporting sequence requires no additional
support, but there is a range of different support levels among RefSeq
records and I was wondering if your pipeline incorporated them all or
whether the pipeline weighted RefSeq records with different status.
The different support levels within RefSeq I'm referring to are: reviewed,
validated, provisional, predicted, inferred, model, etc. described here:
http://www.ncbi.nlm.nih.gov/RefSeq/key.html#status
Cheers,
Steve
> From: Donna Karolchik <donnak at soe.ucsc.edu>
> Date: Fri, 6 Apr 2007 17:14:43 -0700
> To: Genome-announce <genome-announce at soe.ucsc.edu>, genome-mirror
> <genome-mirror at soe.ucsc.edu>
> Subject: [Genome-announce] New UCSC gene prediction set released
>
> We are pleased to announce the release of a new gene prediction set, UCSC
> Genes,
> on the latest human Genome Browser (hg18, NCBI Build 36). This annotation,
> which
> includes putative non-coding genes as well as protein-coding genes and 99.9%
> of
> RefSeq genes, is the next generation of the Known Genes set that UCSC has been
> providing for several years and supersedes the existing Known Genes annotation
> on the hg18 assembly.
>
> The UCSC Genes is a moderately conservative prediction set based on data from
> RefSeq, GenBank, and UniProt. Each entry requires the support of one GenBank
> RNA
> sequence plus at least one additional line of evidence, with the exception of
> RefSeq RNAs, which require no additional evidence. Some of the non-coding
> transcripts in the set may actually code for protein, but the evidence for the
> associated protein is weak at best. Compared to RefSeq, this gene set
> generally
> has about 10% more protein-coding genes, approximately five times as many
> putative non-coding genes, and about twice as many splice variants.
>
> The UCSC Genes set is produced using a computational pipeline developed at
> UCSC
> by Jim Kent, Chuck Sugnet and Mark Diekhans. For detailed information about
> the
> process used to construct the genes set, see the track description page. In
> upcoming months, we plan to release UCSC Genes sets on other organisms in
> addition to human.
>
> As part of this change, we are now using our own UCSC Genes accession numbers
> as
> the primary key into the underlying knownGene table, rather than the GenBank
> mRNA accessions we used in the previous Known Genes prediction set. Note that
> this may affect external sites with URLs that link into our genes track using
> the older-style accessions.
>
> We will continue to provide the older Known Genes track on hg18 under the name
> "Old Known Genes". You may find the following tables useful in referencing the
> older gene set and converting between the two sets:
>
> - knownGeneOld2: new name for table underlying the old Known Genes (previously
> called knownGene)
>
> - kgXrefOld2: new name for table that contains data for converting old Known
> Genes IDs to other IDs (previously called kgXref)
>
> - kg2ToKg3: data for converting old Known Genes IDs to the newer UCSC Genes
> IDs
>
> We'd like to acknowledge the many people affiliated with the UCSC Genome
> Bioinformatics group who worked hard to release this new annotation:
> developers
> Jim Kent, Mark Diekhans, and Fan Hsu (with technical support from several
> other
> engineers in the group); David Haussler; our splendid QA team -- Archana
> Thakkapallayil, Ann Zweig, Robert Kuhn, Kayla Smith, and Brooke Rhead; our
> build
> engineer -- Andy Pohl; and our sysadmin group. We'd also like to thank Chuck
> Sugnet for his input, the people and organizations maintaining the RefSeq,
> UniProt, and GenBank databases, and the scientists worldwide who have
> contributed to them. If you have any questions about this new release, feel
> free
> to contact us at genome at soe.ucsc.edu (general questions) or
> genome-mirror at soe.ucsc.edu (mirror-specific questions).
>
> -Donna
> -----------------------------------
> Donna Karolchik
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
> _______________________________________________
> Genome-announce mailing list
> Genome-announce at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome-announce
More information about the Genome
mailing list