[Genome] knownIsoforms

Archana Thakkapallayil archanat at soe.ucsc.edu
Mon May 7 13:29:56 PDT 2007


Hello Jeff,

I would like to add this to Fan's reply:

The clustering is done one strand at a time, and is done at the exon 
level.   There are a few cases where you can have two transcripts that 
overlap before splicing, but don't overlap after splicing.  The 
clustering is done in effect after splicing.

I hope this information is helpful to you. If you have further questions 
please don't hesitate to contact us again.

Regards,

Archana
UCSC Genome Bioinformatics Group


Fan Hsu wrote:
> Hi Jeff,
>
> The knownisoforms table is generated by a program txGeneCanonical,
> written by Jim Kent.  Jim is out of town today, not sure he has email access
> or not.
>
> My guess is that this program first identifies overlapping genes and then
> find the longest one and designate it as the representative
> canonical gene of the cluster.
>
> In the mean time, you can find the details of this program
> by downloading our src tree and find this program
> under:
>
> kent/src/hg/txGene/txGeneCanonical
>
> Fan.
> -----Original Message-----
> From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On
> Behalf Of Jeffrey Rosenfeld
> Sent: Monday, May 07, 2007 10:16 AM
> To: genome at soe.ucsc.edu
> Subject: [Genome] knownIsoforms
>
>
> How is the knownisoforms table constructed? It seems that all
> overlapping genes are clustered together, but there are examples, such
> as at the very beginning of chromosome 1, where a transcript within a
> cluster becomes its own cluster.  When I run the following query on hg18:
>
> select clusterID,name,chrom,strand,txStart,txEnd,cdsStart,cdsEnd  from
> knownIsoforms, knownGene where transcript = name;
>
> These are the results I get:
>
> | clusterID | name       | chrom | strand | txStart | txEnd  | cdsStart
> | cdsEnd |
> +-----------+------------+-------+--------+---------+--------+----------+---
> -----+
> |         1 | uc001aaa.1 | chr1  | +      |    1736 |   4121 |     1736
> |   1736 |
> |         2 | uc001aab.1 | chr1  | -      |    4558 |  14764 |     4558
> |   4558 |
> |         2 | uc001aac.1 | chr1  | -      |    4558 |  19346 |     4558
> |   4558 |
> |         2 | uc001aad.1 | chr1  | -      |    4558 |   7231 |     4558
> |   7173 |
> |         2 | uc001aae.1 | chr1  | -      |    4558 |   9622 |     4558
> |   4558 |
> |         2 | uc001aaf.1 | chr1  | -      |    4832 |  19672 |     4832
> |   4832 |
> |         2 | uc001aag.1 | chr1  | -      |    5658 |   7231 |     5658
> |   5658 |
> |         2 | uc001aah.1 | chr1  | -      |    6720 |  19346 |     6720
> |   6720 |
> |         2 | uc001aai.1 | chr1  | -      |    6720 |   9622 |     6720
> |   6720 |
> |         3 | uc001aaj.1 | chr1  | -      |    7777 |  19346 |     7777
> |  14749 |
>
>
> Shouldn't cluster 3 be included as part of cluster 4?
>
> Thank You,
>
> Jeffrey Rosenfeld
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   


More information about the Genome mailing list