[Genome] new identifiers questions
Pauline Fujita
pauline at soe.ucsc.edu
Wed Apr 2 13:28:41 PDT 2008
Hello Juliette,
As a follow up to your question we heard back from the developer
responsible for assigning the UCSC gene identifiers and he had this
additional comment on how the identifiers will change in the future:
In general the gene identifiers will change little. The identifiers are
in the form "accession.version" (ie. uc001aaa.1). Right now all the
"versions" are "1". In cases where a gene has a little extra sequence
added to the start or end but is otherwise unchanged the version number
will be incremented in the next build, which we are working on
currently. New splicing variants and new genes will get a new
accession. In some relatively rare cases genes will be dropped as in
cases where Genbank or RefSeq records supporting the gene are withdrawn.
I hope this information is helpful to you. Please don't hesitate to
contact the mail list again if you have further questions.
Regards,
Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Juliette Aury Landas wrote:
> Good morning,
>
> I am an intership at the Institut Curie (Paris, France). I just would
> like to ask you few questions about the UCSC data, specially about the
> new identifiers which recently appeared in files available on ftp
> website (ex : uc002ide.1).
> I am in charge of a database development. The main goal of it is to link
> all aliases of different identifiers (genes, transcripts and proteins).
> I also want to keep the history of each identifier. For example I would
> like to be able to know which transcripts names was linked to a gene at
> a specific date. That is why I am interested in these new UCSC
> identifiers :
> How do you create these identifiers ?
> What is the cardinality ? One UCSC identifier for one gene, or one UCSC
> identifier for one transcript, or one UCSC identifier for one
> annotation... ? I don't really understand this new nomenclature (numbers
> and letters) ?
> Is there a link between these identifiers and the NCBI GeneID ?
> How does the UCSC identifier change when a Gene Symbol is updated or
> when the annotation changes (for example TREX1 was annotated as a single
> gene with two non-overlapping coding regions ; now the downstream coding
> region is represented by TREX1 and the upstream coding region is
> represented by ATRIP) ?
>
> Thanks in advance.
>
> Juliette Aury-Landas
>
>
More information about the Genome
mailing list