[Genome] hg18.snp128, non-synonymous snps

Brooke Rhead rhead at soe.ucsc.edu
Tue May 20 17:14:11 PDT 2008


Hello Christian,

The snp128.func field contains the functional classification from the 
dbSNP file b128_SNPContigLocusId_36_2.bcp.gz (look in the "Data Sources" 
section of the UCSC SNP description page for a link to the dbSNP 
downloads). Many SNPs have more than one function assigned to them, 
which appear in our snp128 table separated by commas, for example:

intron,cds-reference,nonsense

... so you will need to put "%" wild-cards in your search to catch all 
instances of a classification.

Also note that in the snp128 table, SNPs that we refer to as 
non-synonymous in the Genome Browser can have a few different 
classifications in the snp128.func field (from the snp128 Genome Browser 
description page):

"Coding - Non-Synonymous - change in peptide for allele with respect to 
reference assembly (includes coding-nonsynon, nonsense, missense, 
frameshift)"

The counts for these alternate names are not zero:

mysql> select count(*) from hg18.snp128 where func like '%nonsense%'; 

+----------+
| count(*) |
+----------+
|     2562 |
+----------+
1 row in set (9.56 sec)

mysql> select count(*) from hg18.snp128 where func like '%missense%';
+----------+
| count(*) |
+----------+
|    83845 |
+----------+
1 row in set (9.58 sec)

mysql> select count(*) from hg18.snp128 where func like '%frameshift%';
+----------+
| count(*) |
+----------+
|    12651 |
+----------+
1 row in set (9.53 sec)

Confusingly, there are actually no instances of the function 
"coding-nonsynon" in dbSNP build 128 (for human, at least -- this 
classification is used in earlier human builds from dbSNP, like snp126):

mysql> select count(*) from hg18.snp128 where func like '%nonsyn%';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (9.77 sec)

Also note that the nonsense, missense, and frameshift classifications 
are not mutually exclusive, so a search like this will turn up a lower 
number of SNPs than the total number of SNPs found when you look for 
nonsense, missense, and frameshift separately:

mysql> select count(*) from hg18.snp128 where func like '%nonsense%' or 
func like '%missense%' or func like '%frameshift%';
+----------+
| count(*) |
+----------+
|    98442 |
+----------+
1 row in set (12.17 sec)

A search like this will get a list of all of the combinations of 
functions found in the snp128 table:

mysql> select distinct func, count(*) from snp128 group by func;
+-----------------------------------+----------+
| func                              | count(*) |
+-----------------------------------+----------+
| unknown                           |  7404414 |
| intron                            |  4219494 |
| coding-synon,cds-reference        |    53986 |
| coding-synon,intron,cds-reference |     3015 |
...(continued)

Sometimes a SNP is classified as both "coding-synon" and nonsense, 
missense, or frameshift.  This dbSNP FAQ describes what may be happening 
in these cases:

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpsnpfaq.section.Reports.SNP_Class_Definition#Reports.Functional_Class

"Because functional classification is defined by positional and sequence 
parameters, two facts emerge: (a) if a gene has multiple transcripts 
because of alternative splicing, then a variation may have several 
different functional relationships to the gene; and (b) if multiple 
genes are densely packed in a contig region, then a variation at a 
single location in the genome may have multiple, potentially different, 
relationships to its local gene neighbors."

I hope this information is helpful.  If you have further questions, 
please feel free to contact us again at genome at soe.ucsc.edu.

--
Brooke Rhead
UCSC Genome Bioinformatics Group



C.Gilissen at antrg.umcn.nl wrote:
> Hi,
> 
> I was interested in retrieving information about snps from the UCSC
> database. More specifically I wanted to know which snps are synonymous
> and which are not.
> 
> To my surprise I saw:
> 
> select count(*) from hg18.snp128 where func='coding-nonsynon';
> +----------+
> | count(*) |
> +----------+
> |        0 |
> +----------+
> 1 row in set (1 min 0.63 sec)
> 
> (I noticed that for hg17 this query shows there are 206 records?)
> 
> So, do you actually have synonymous/non-synonymous information about
> SNPs, and if not, do you have any plans for the future? Do you know of
> other reliable (and updated) sources to retrieve this information from?
> 
> Kind regards,
> 
> Christian Gilissen
> 
> 
> 
> Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
> The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.
> 
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list