[Genome] Gene positions

Rachel Harte hartera at soe.ucsc.edu
Mon Apr 30 13:25:51 PDT 2007


Richard,

This format of table is used typically for tables for tracks in the "Genes
and Gene Predictions" group but is also used for any other data that is
loaded into the database from a GFF or GTF file - see these links for file
format descriptions:
http://hgw1.cse.ucsc.edu/FAQ/FAQformat#format3
http://hgw1.cse.ucsc.edu/FAQ/FAQformat#format4

Examples of this type of table (genPred format) and found for the Known
Genes or UCSC Genes tracks (knownGene table), RefSeq (refGene), Ensembl
Genes (ensGene), Genscan Genes (genscan) etc.

Some of these are extended genePred tables which also include fields for
the exon frames:

    uint id;                      /* Numeric id of gene annotation,
                                   * zero if not available. */
    string name2;                  /* Secondary name. (e.g. name of
                                   gene), or
                                   * empty if none, NULL if field not
                                   * requested */
    enum cdsStatus cdsStartStat;  /* Status of cdsStart annotation */
    enum cdsStatus cdsEndStat;    /* Status of cdsEnd annotation */
    uint exonFrames;              /* List of frame for each exon, or -1
                                   * if no frame or not known. NULL if not
                                   * available. */
Status for cdsStartStat and for cdsEndStat can be:
 'none','unk','incmpl','cmpl'

knownGene is a special case that has extra fields: alignID and proteinID.

These tables can be downloaded through our downloads server:
http://hgdownload.cse.ucsc.edu

Click on the organism of interest, locate the appropriate assembly and
then click on the "Annotation database" link. From there, you can dowload
these tables. Tables may also be downloaded through the Table Browser by
following the "Tables" link on the top blue bar.

I hope that this helps you. Please let us know if you have further
questions.

Rachel

 Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


On Mon, 30 Apr 2007, Richard J. Feldmann wrote:

> Rachel,
>
> Thanks for responding.
>
> >Are you asking for the start and stop positions of genes relative to the
> >genome or relative to contigs?
>
> I am looking for the position of the genes relative to the genome.
>
> I see this entry
>
> Gene Predictions and RefSeq Genes
> The following definition is used for gene prediction tables. In
> alternative-splicing situations, each transcript has a row in this
> table.
>
> table genePred
> "A gene prediction."
>      (
>      string  name;               "Name of gene"
>      string  chrom;              "Chromosome name"
>      char[1] strand;             "+ or - for strand"
>      uint    txStart;            "Transcription start position"
>      uint    txEnd;              "Transcription end position"
>      uint    cdsStart;           "Coding region start"
>      uint    cdsEnd;             "Coding region end"
>      uint    exonCount;          "Number of exons"
>      uint[exonCount] exonStarts; "Exon start positions"
>      uint[exonCount] exonEnds;   "Exon end positions"
>      )
>
> in
>
> http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
>
> But when I look in
>
> http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/
>
> I don't see anything that uses this format.
>
> Richard.
>
> --------------------
>
> >On Mon, 30 Apr 2007, Richard J. Feldmann wrote:
> >
> >>  I know that this question should have a simple answer but I am
> >>  baffled at the moment - perhaps someone could help me.
> >>
> >>  For the human genome, USCS has genomically pinned data whereas NCBI
> >>  has contig-based data.
> >>
> >>  I am quite accustomed to using the NCBI (.gbk) and (.ptt) files for
> >>  many genomes.
> >>
> >>  For mouse where I used RIKEN's Fantom3 transcriptome data, I used the
> >>  genomically pinned data from UCSC.  The RIKEN data must have had the
> >>  gene positions along with their transcript tags.
> >>
> >>  Where at UCSC do I find the start and stop positions of genes?
> >>
> >>  Richard J. Feldmann.
> >>
> >>  --
> >>  -----------------------------------------------------------------
> >>    Richard J. Feldmann                                 (v) 301-926-0921
> >>    Global Determinants, Inc.                         (c) 301-526-8524
> >>    17800 Mill Creek Dr.
> >>    Derwood, Maryland 20855-1019       rjfeldma at globaldeterminants.com
> >>  -----------------------------------------------------------------
> >>  _______________________________________________
> >>  Genome maillist  -  Genome at soe.ucsc.edu
> >>  http://www.soe.ucsc.edu/mailman/listinfo/genome
> >>
> >--
> >-----------------------------------------------------------------
> >  Richard J. Feldmann                                 (v) 301-926-0921
> >  Global Determinants, Inc.                         (f) 301-926-7954
> >  17800 Mill Creek Dr.                                (c) 301-526-8524
> >  Derwood, Maryland 20855-1019       rjfeldma at globaldeterminants.com
> >-----------------------------------------------------------------
>
>
> --
> -----------------------------------------------------------------
>   Richard J. Feldmann                                 (v) 301-926-0921
>   Global Determinants, Inc.                         (c) 301-526-8524
>   17800 Mill Creek Dr.
>   Derwood, Maryland 20855-1019       rjfeldma at globaldeterminants.com
> -----------------------------------------------------------------


More information about the Genome mailing list