[Genome] knownIsoforms
Archana Thakkapallayil
archanat at soe.ucsc.edu
Mon May 7 13:29:56 PDT 2007
Hello Jeff,
I would like to add this to Fan's reply:
The clustering is done one strand at a time, and is done at the exon
level. There are a few cases where you can have two transcripts that
overlap before splicing, but don't overlap after splicing. The
clustering is done in effect after splicing.
I hope this information is helpful to you. If you have further questions
please don't hesitate to contact us again.
Regards,
Archana
UCSC Genome Bioinformatics Group
Fan Hsu wrote:
> Hi Jeff,
>
> The knownisoforms table is generated by a program txGeneCanonical,
> written by Jim Kent. Jim is out of town today, not sure he has email access
> or not.
>
> My guess is that this program first identifies overlapping genes and then
> find the longest one and designate it as the representative
> canonical gene of the cluster.
>
> In the mean time, you can find the details of this program
> by downloading our src tree and find this program
> under:
>
> kent/src/hg/txGene/txGeneCanonical
>
> Fan.
> -----Original Message-----
> From: genome-bounces at soe.ucsc.edu [mailto:genome-bounces at soe.ucsc.edu]On
> Behalf Of Jeffrey Rosenfeld
> Sent: Monday, May 07, 2007 10:16 AM
> To: genome at soe.ucsc.edu
> Subject: [Genome] knownIsoforms
>
>
> How is the knownisoforms table constructed? It seems that all
> overlapping genes are clustered together, but there are examples, such
> as at the very beginning of chromosome 1, where a transcript within a
> cluster becomes its own cluster. When I run the following query on hg18:
>
> select clusterID,name,chrom,strand,txStart,txEnd,cdsStart,cdsEnd from
> knownIsoforms, knownGene where transcript = name;
>
> These are the results I get:
>
> | clusterID | name | chrom | strand | txStart | txEnd | cdsStart
> | cdsEnd |
> +-----------+------------+-------+--------+---------+--------+----------+---
> -----+
> | 1 | uc001aaa.1 | chr1 | + | 1736 | 4121 | 1736
> | 1736 |
> | 2 | uc001aab.1 | chr1 | - | 4558 | 14764 | 4558
> | 4558 |
> | 2 | uc001aac.1 | chr1 | - | 4558 | 19346 | 4558
> | 4558 |
> | 2 | uc001aad.1 | chr1 | - | 4558 | 7231 | 4558
> | 7173 |
> | 2 | uc001aae.1 | chr1 | - | 4558 | 9622 | 4558
> | 4558 |
> | 2 | uc001aaf.1 | chr1 | - | 4832 | 19672 | 4832
> | 4832 |
> | 2 | uc001aag.1 | chr1 | - | 5658 | 7231 | 5658
> | 5658 |
> | 2 | uc001aah.1 | chr1 | - | 6720 | 19346 | 6720
> | 6720 |
> | 2 | uc001aai.1 | chr1 | - | 6720 | 9622 | 6720
> | 6720 |
> | 3 | uc001aaj.1 | chr1 | - | 7777 | 19346 | 7777
> | 14749 |
>
>
> Shouldn't cluster 3 be included as part of cluster 4?
>
> Thank You,
>
> Jeffrey Rosenfeld
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list