[Genome] GTF Format
Barry Moore
barry.moore at genetics.utah.edu
Thu Oct 18 09:03:35 PDT 2007
Hi-
I just downloaded 'UCSC Genes' for the current human build from the
Table Browser, and I noticed that the gene_id in the 9th field is
always the same as the transcript ID even though the gene may have
multiple transcripts. This makes it impossible to accurately
associate transcripts into gene groups. Is this a "feature" or a
"bug"? Is is possible for me to alter this behavior from the web
interface? For example downloading UCSC genes in GTF format from
region chr1:176330253-176710180 for March 2006 produced the following
snippets which are two different transcripts on the same gene, and
yet the
chr1 hg18_knownGene start_codon 176330305 176330307 0.000000 + .
gene_id "uc001glp.1"; transcript_id "uc001glp.1";
chr1 hg18_knownGene CDS 176330305 176330452 0.000000 + 0 gene_id
"uc001glp.1"; transcript_id "uc001glp.1";
chr1 hg18_knownGene exon 176330253 176330452 0.000000 + . gene_id
"uc001glp.1"; transcript_id "uc001glp.1";
chr1 hg18_knownGene CDS 176519322 176519449 0.000000 + 2 gene_id
"uc001glp.1"; transcript_id "uc001glp.1";
...more exons and CDSs and then...
chr1 hg18_knownGene exon 176708833 176710180 0.000000 + . gene_id
"uc001glp.1"; transcript_id "uc001glp.1";
chr1 hg18_knownGene start_codon 176330305 176330307 0.000000 + .
gene_id "uc001glq.1"; transcript_id "uc001glq.1";
chr1 hg18_knownGene CDS 176330305 176330452 0.000000 + 0 gene_id
"uc001glq.1"; transcript_id "uc001glq.1";
chr1 hg18_knownGene exon 176330253 176330452 0.000000 + . gene_id
"uc001glq.1"; transcript_id "uc001glq.1";
More information about the Genome
mailing list