[Genome] A question about SNPs

Brooke Rhead rhead at soe.ucsc.edu
Tue May 6 18:05:00 PDT 2008


Hi Shan,

We get all of our SNP data directly from dbSNP, including position 
information.  Sometimes the different pieces of data for a single SNP 
contradict each other, and we try to annotate these cases.  For 
rs4030808 on chr1, I see this information:

dbSNP: rs4030808
Position: chr1:36267-36266
Genomic Size: 0
Observed: C/T

When the start position is one greater than the end position (with a 
genomic size of zero), as in this case, that indicates an insertion 
between the two bases.  But when a SNP is an insertion, we expect to see 
an allele of "-" among the observed alleles from dbSNP.  In this case we 
do not see a "-" (dbSNP just reports C and T).  This is an inconsistency 
in dbSNP's data, which we annotate with these notes (on the SNP details 
page):

Annotations:
All observed alleles are single-base, but the annotation spans 0 bases. 
(UCSC's re-alignment of flanking sequences to the genome may be 
informative -- see below.)
UCSC reference allele does not match any observed allele from dbSNP.

In cases like this, it is often helpful to look at the alignments shown 
on our SNP details pages.  We re-align the flanking sequences provided 
by dbSNP to the genomic sequence with the idea that it will be easier to 
see how the two sequences relate.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


Shan Yang wrote:
> Hi,
>
> I used the snp128 data for my research. And I found many cases where the start of SNP is greater than the end of SNP. If it is an indel, it may make some sense, however, a lot of these cases are single nucleotide change like this one shown here.
> http://genome.ucsc.edu/cgi-bin/hgc?hgsid=107011994&o=36266&t=36266&g=snp128&i=rs4030808&c=chr1&l=36264&r=36268&db=hg18&pix=1200
>
> What is the explaination to this?
>
> My guess is that dbsnp data came from various sources and some of them use 0 start, half close region and some of them use 1 start, close region. When you put them on genome browser, you treat them all as 0 start and half close, thus, when you convert them into 1 start, closed region, you'll add 1 to the start coordinate and keep the end coordinate. Thus, if start=end in the original coordinate, they will be start = end +1 in the genome browser.
>
> I don't know if this is right, but all the cases I've seen here all have start = end +1 problem.
>
> Thanks!
>
> Shan
>
>
>
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>   


More information about the Genome mailing list