[Genome] GTF Format

Brooke Rhead rhead at soe.ucsc.edu
Thu Oct 18 15:15:12 PDT 2007


Hello Barry,

While we make the knownGene table available in GTF format via the Table 
Browser, we do not currently utilize the gene_id and transcript_id 
fields to distinguish various transcripts.  We are using GTF format 
version 2.2, which requires the presence of both the gene_id and 
transcript_id fields.  Since we do not store information about 
transcripts and clusters of transcripts in the knownGene table, the two 
fields are just filled by the Table Browser with the knownGene.name. 
This is admittedly a little confusing, if you are expecting to get 
meaningful information from the gene_id field.

However, there is another way to associate transcripts into gene 
clusters using the Table Browser.  The two tables that contain this 
information are 'knownCanonical', which contains a single canonical 
splice variant of each cluster of transcripts, and 'knownIsoforms', 
which contains the names of the remaining transcripts in the cluster. If 
you need help using these tables, please let us know.

-- 
Brooke Rhead
UCSC Genome Bioinformatics Group


Barry Moore wrote:
> Hi-
> 
> I just downloaded 'UCSC Genes' for the current human build from the  
> Table Browser, and I noticed that the gene_id in the 9th field is  
> always the same as the transcript ID even though the gene may have  
> multiple transcripts.  This makes it impossible to accurately  
> associate transcripts into gene groups.  Is this a "feature" or a  
> "bug"?  Is is possible for me to alter this behavior from the web  
> interface?  For example downloading UCSC genes in GTF format from  
> region chr1:176330253-176710180 for March 2006 produced the following  
> snippets which are two different transcripts on the same gene, and   
> yet the
> 
> 
> chr1	hg18_knownGene	start_codon	176330305	176330307	0.000000	+	.	 
> gene_id "uc001glp.1"; transcript_id "uc001glp.1";
> chr1	hg18_knownGene	CDS	176330305	176330452	0.000000	+	0	gene_id  
> "uc001glp.1"; transcript_id "uc001glp.1";
> chr1	hg18_knownGene	exon	176330253	176330452	0.000000	+	.	gene_id  
> "uc001glp.1"; transcript_id "uc001glp.1";
> chr1	hg18_knownGene	CDS	176519322	176519449	0.000000	+	2	gene_id  
> "uc001glp.1"; transcript_id "uc001glp.1";
> 
> ...more exons and CDSs and then...
> 
> chr1	hg18_knownGene	exon	176708833	176710180	0.000000	+	.	gene_id  
> "uc001glp.1"; transcript_id "uc001glp.1";
> chr1	hg18_knownGene	start_codon	176330305	176330307	0.000000	+	.	 
> gene_id "uc001glq.1"; transcript_id "uc001glq.1";
> chr1	hg18_knownGene	CDS	176330305	176330452	0.000000	+	0	gene_id  
> "uc001glq.1"; transcript_id "uc001glq.1";
> chr1	hg18_knownGene	exon	176330253	176330452	0.000000	+	.	gene_id  
> "uc001glq.1"; transcript_id "uc001glq.1";
> 
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list