[Genome] new identifiers questions

Pauline Fujita pauline at soe.ucsc.edu
Wed Apr 2 13:28:41 PDT 2008


Hello Juliette,

As a follow up to your question we heard back from the developer 
responsible for assigning the UCSC gene identifiers and he had this 
additional comment on how the identifiers will change in the future:

In general the gene identifiers will change little.  The identifiers are 
in the form "accession.version" (ie. uc001aaa.1). Right now all the 
"versions" are "1".  In cases where a gene has a little extra sequence 
added to the start or end but is otherwise unchanged the version number 
will be incremented in the next build, which we are working on 
currently.  New splicing variants and new genes will get a new 
accession.  In some relatively rare cases genes will be dropped as in 
cases where Genbank or RefSeq records supporting the gene are withdrawn.

I hope this information is helpful to you. Please don't hesitate to 
contact the mail list again if you have further questions.

Regards,

Pauline Fujita

UCSC Genome Bioinformatics Group
http://genome.ucsc.edu

Juliette Aury Landas wrote:
> Good morning,
>
> I am an intership at the Institut Curie (Paris, France). I just would 
> like to ask you few questions about the UCSC data, specially about the 
> new identifiers which recently appeared in files available on ftp 
> website (ex : uc002ide.1).
> I am in charge of a database development. The main goal of it is to link 
> all aliases of different identifiers (genes, transcripts and proteins). 
> I also want to keep the history of each identifier. For example I would 
> like to be able to know which transcripts names was linked to a gene at 
> a specific date. That is why I am interested in these new UCSC 
> identifiers :
> How do you create these identifiers ?
> What is the cardinality ? One UCSC identifier for one gene, or one UCSC 
> identifier for one transcript, or one UCSC identifier for one 
> annotation... ? I don't really understand this new nomenclature (numbers 
> and letters) ?
> Is there a link between these identifiers and the NCBI GeneID ?
> How does the UCSC identifier change when a Gene Symbol is updated or 
> when the annotation changes (for example TREX1 was annotated as a single 
> gene with two non-overlapping coding regions ; now the downstream coding 
> region is represented by TREX1 and the upstream coding region is 
> represented by ATRIP) ?
>
> Thanks in advance.
>
> Juliette Aury-Landas
>
>   





More information about the Genome mailing list