[Genome] How to get snps locate between the start and end chromosomal location of each gene with Entrez GeneID

Brooke Rhead rhead at soe.ucsc.edu
Thu Jan 3 12:33:25 PST 2008


Hello Shouguo,

There are two tables related to UCSC Genes that you can use to get a 
single splice variant for each gene: knownCanonical and knownIsoforms.

The knownIsoforms table contains all of the items in knownGene grouped 
into clusters, and knownCanonical contains a single isoform for each 
cluster, along with its position. (See this previously-answered question 
for how these tables are made: 
http://www.soe.ucsc.edu/pipermail/genome/2005-July/008123.html .)

In your example, the canonical transcript for the locusLink ID 375690 is 
uc001aad.1:

mysql> select * from knownIsoforms where clusterId=2 order by transcript;
+-----------+------------+
| clusterId | transcript |
+-----------+------------+
|         2 | uc001aab.1 |
|         2 | uc001aac.1 |
|         2 | uc001aad.1 |
|         2 | uc001aae.1 |
|         2 | uc001aaf.1 |
|         2 | uc001aag.1 |
|         2 | uc001aah.1 |
|         2 | uc001aai.1 |
+-----------+------------+
8 rows in set (0.01 sec)

mysql> select * from knownCanonical where clusterId=2;
+-------+------------+----------+-----------+------------+------------+
| chrom | chromStart | chromEnd | clusterId | transcript | protein    |
+-------+------------+----------+-----------+------------+------------+
| chr1  |       4558 |     7231 |         2 | uc001aad.1 | uc001aad.1 |
+-------+------------+----------+-----------+------------+------------+
1 row in set (0.01 sec)


I hope this information helps.  Please let us know if you have any 
further questions.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


Gao, Shouguo wrote:
> It is really useful to connect to mySQL server. Now I have a large
> snp list, and I am looking for the snps related to a gene list (Entrez
Geneid), which locate between 1000 upstream and 500 downstream of genes.
The most important thing is to find the start and end chromosomal
location of each gene with Entrez GeneID. I used knowngene and
knownToLocusLink tables, but got several locations for one Entrez GeneID
(Locuslink).
> 
> 
> 
> name locuslink chro strand start end
> 
> uc001aab.1 375690 chr1 - 4558 14764
> 
> uc001aac.1 375690 chr1 - 4558 19346
> 
> uc001aad.1 375690 chr1 - 4558 7231
> 
> uc001aae.1 375690 chr1 - 4558 9622
> 
> uc001aaf.1 375690 chr1 - 4832 19672
> 
> uc001aag.1 375690 chr1 - 5658 7231
> 
> uc001aah.1 375690 chr1 - 6720 19346
> 
> uc001aai.1 375690 chr1 - 6720 9622
> 
> 
> 
> The reason should be that the same Entrez Gene ID matches multiple
UCSC known genes (transcript).
> 
> 
> 
> In NCBI one Entrez GeneID only has one start and end location. Could
you please tell me how to solve this issue? Is there any other table for
that?

>  
> 
> Thanks
> 
>  
> 
> Shouguo
> 
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list