[Genome] GTF Format

Barry Moore barry.moore at genetics.utah.edu
Thu Oct 18 09:03:35 PDT 2007


Hi-

I just downloaded 'UCSC Genes' for the current human build from the  
Table Browser, and I noticed that the gene_id in the 9th field is  
always the same as the transcript ID even though the gene may have  
multiple transcripts.  This makes it impossible to accurately  
associate transcripts into gene groups.  Is this a "feature" or a  
"bug"?  Is is possible for me to alter this behavior from the web  
interface?  For example downloading UCSC genes in GTF format from  
region chr1:176330253-176710180 for March 2006 produced the following  
snippets which are two different transcripts on the same gene, and   
yet the


chr1	hg18_knownGene	start_codon	176330305	176330307	0.000000	+	.	 
gene_id "uc001glp.1"; transcript_id "uc001glp.1";
chr1	hg18_knownGene	CDS	176330305	176330452	0.000000	+	0	gene_id  
"uc001glp.1"; transcript_id "uc001glp.1";
chr1	hg18_knownGene	exon	176330253	176330452	0.000000	+	.	gene_id  
"uc001glp.1"; transcript_id "uc001glp.1";
chr1	hg18_knownGene	CDS	176519322	176519449	0.000000	+	2	gene_id  
"uc001glp.1"; transcript_id "uc001glp.1";

...more exons and CDSs and then...

chr1	hg18_knownGene	exon	176708833	176710180	0.000000	+	.	gene_id  
"uc001glp.1"; transcript_id "uc001glp.1";
chr1	hg18_knownGene	start_codon	176330305	176330307	0.000000	+	.	 
gene_id "uc001glq.1"; transcript_id "uc001glq.1";
chr1	hg18_knownGene	CDS	176330305	176330452	0.000000	+	0	gene_id  
"uc001glq.1"; transcript_id "uc001glq.1";
chr1	hg18_knownGene	exon	176330253	176330452	0.000000	+	.	gene_id  
"uc001glq.1"; transcript_id "uc001glq.1";




More information about the Genome mailing list