[Genome] GTF Format
Brooke Rhead
rhead at soe.ucsc.edu
Thu Oct 18 15:15:12 PDT 2007
Hello Barry,
While we make the knownGene table available in GTF format via the Table
Browser, we do not currently utilize the gene_id and transcript_id
fields to distinguish various transcripts. We are using GTF format
version 2.2, which requires the presence of both the gene_id and
transcript_id fields. Since we do not store information about
transcripts and clusters of transcripts in the knownGene table, the two
fields are just filled by the Table Browser with the knownGene.name.
This is admittedly a little confusing, if you are expecting to get
meaningful information from the gene_id field.
However, there is another way to associate transcripts into gene
clusters using the Table Browser. The two tables that contain this
information are 'knownCanonical', which contains a single canonical
splice variant of each cluster of transcripts, and 'knownIsoforms',
which contains the names of the remaining transcripts in the cluster. If
you need help using these tables, please let us know.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
Barry Moore wrote:
> Hi-
>
> I just downloaded 'UCSC Genes' for the current human build from the
> Table Browser, and I noticed that the gene_id in the 9th field is
> always the same as the transcript ID even though the gene may have
> multiple transcripts. This makes it impossible to accurately
> associate transcripts into gene groups. Is this a "feature" or a
> "bug"? Is is possible for me to alter this behavior from the web
> interface? For example downloading UCSC genes in GTF format from
> region chr1:176330253-176710180 for March 2006 produced the following
> snippets which are two different transcripts on the same gene, and
> yet the
>
>
> chr1 hg18_knownGene start_codon 176330305 176330307 0.000000 + .
> gene_id "uc001glp.1"; transcript_id "uc001glp.1";
> chr1 hg18_knownGene CDS 176330305 176330452 0.000000 + 0 gene_id
> "uc001glp.1"; transcript_id "uc001glp.1";
> chr1 hg18_knownGene exon 176330253 176330452 0.000000 + . gene_id
> "uc001glp.1"; transcript_id "uc001glp.1";
> chr1 hg18_knownGene CDS 176519322 176519449 0.000000 + 2 gene_id
> "uc001glp.1"; transcript_id "uc001glp.1";
>
> ...more exons and CDSs and then...
>
> chr1 hg18_knownGene exon 176708833 176710180 0.000000 + . gene_id
> "uc001glp.1"; transcript_id "uc001glp.1";
> chr1 hg18_knownGene start_codon 176330305 176330307 0.000000 + .
> gene_id "uc001glq.1"; transcript_id "uc001glq.1";
> chr1 hg18_knownGene CDS 176330305 176330452 0.000000 + 0 gene_id
> "uc001glq.1"; transcript_id "uc001glq.1";
> chr1 hg18_knownGene exon 176330253 176330452 0.000000 + . gene_id
> "uc001glq.1"; transcript_id "uc001glq.1";
>
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list