[Genome] hg18.snp128, non-synonymous snps
Brooke Rhead
rhead at soe.ucsc.edu
Tue May 20 17:14:11 PDT 2008
Hello Christian,
The snp128.func field contains the functional classification from the
dbSNP file b128_SNPContigLocusId_36_2.bcp.gz (look in the "Data Sources"
section of the UCSC SNP description page for a link to the dbSNP
downloads). Many SNPs have more than one function assigned to them,
which appear in our snp128 table separated by commas, for example:
intron,cds-reference,nonsense
... so you will need to put "%" wild-cards in your search to catch all
instances of a classification.
Also note that in the snp128 table, SNPs that we refer to as
non-synonymous in the Genome Browser can have a few different
classifications in the snp128.func field (from the snp128 Genome Browser
description page):
"Coding - Non-Synonymous - change in peptide for allele with respect to
reference assembly (includes coding-nonsynon, nonsense, missense,
frameshift)"
The counts for these alternate names are not zero:
mysql> select count(*) from hg18.snp128 where func like '%nonsense%';
+----------+
| count(*) |
+----------+
| 2562 |
+----------+
1 row in set (9.56 sec)
mysql> select count(*) from hg18.snp128 where func like '%missense%';
+----------+
| count(*) |
+----------+
| 83845 |
+----------+
1 row in set (9.58 sec)
mysql> select count(*) from hg18.snp128 where func like '%frameshift%';
+----------+
| count(*) |
+----------+
| 12651 |
+----------+
1 row in set (9.53 sec)
Confusingly, there are actually no instances of the function
"coding-nonsynon" in dbSNP build 128 (for human, at least -- this
classification is used in earlier human builds from dbSNP, like snp126):
mysql> select count(*) from hg18.snp128 where func like '%nonsyn%';
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (9.77 sec)
Also note that the nonsense, missense, and frameshift classifications
are not mutually exclusive, so a search like this will turn up a lower
number of SNPs than the total number of SNPs found when you look for
nonsense, missense, and frameshift separately:
mysql> select count(*) from hg18.snp128 where func like '%nonsense%' or
func like '%missense%' or func like '%frameshift%';
+----------+
| count(*) |
+----------+
| 98442 |
+----------+
1 row in set (12.17 sec)
A search like this will get a list of all of the combinations of
functions found in the snp128 table:
mysql> select distinct func, count(*) from snp128 group by func;
+-----------------------------------+----------+
| func | count(*) |
+-----------------------------------+----------+
| unknown | 7404414 |
| intron | 4219494 |
| coding-synon,cds-reference | 53986 |
| coding-synon,intron,cds-reference | 3015 |
...(continued)
Sometimes a SNP is classified as both "coding-synon" and nonsense,
missense, or frameshift. This dbSNP FAQ describes what may be happening
in these cases:
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpsnpfaq.section.Reports.SNP_Class_Definition#Reports.Functional_Class
"Because functional classification is defined by positional and sequence
parameters, two facts emerge: (a) if a gene has multiple transcripts
because of alternative splicing, then a variation may have several
different functional relationships to the gene; and (b) if multiple
genes are densely packed in a contig region, then a variation at a
single location in the genome may have multiple, potentially different,
relationships to its local gene neighbors."
I hope this information is helpful. If you have further questions,
please feel free to contact us again at genome at soe.ucsc.edu.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
C.Gilissen at antrg.umcn.nl wrote:
> Hi,
>
> I was interested in retrieving information about snps from the UCSC
> database. More specifically I wanted to know which snps are synonymous
> and which are not.
>
> To my surprise I saw:
>
> select count(*) from hg18.snp128 where func='coding-nonsynon';
> +----------+
> | count(*) |
> +----------+
> | 0 |
> +----------+
> 1 row in set (1 min 0.63 sec)
>
> (I noticed that for hg17 this query shows there are 206 records?)
>
> So, do you actually have synonymous/non-synonymous information about
> SNPs, and if not, do you have any plans for the future? Do you know of
> other reliable (and updated) sources to retrieve this information from?
>
> Kind regards,
>
> Christian Gilissen
>
>
>
> Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
> The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.
>
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
More information about the Genome
mailing list