[Genome] Questions about indexing in the .fa files

Todd Riley triley at ias.edu
Sat Jan 6 15:02:52 PST 2007


Hello,

I have some questions about the indexing in the *.fa files, which 
unfortunately I could not find in the docs. First let me thank you 
kindly for your answers...

1. Is the first nucleotide in a *.fa file index 0 or 1?

2. Next I am a bit confused about indexes found in the known genes table 
compared to what queries return:

Let's take a positive strand gene like NM_018234 in hg17, the known gene 
table gives this info:

#name    chrom    strand    txStart    txEnd    cdsStart    cdsEnd    exonCount    exonStarts    exonEnds    proteinID    alignID

NM_018234    chr2    +    119697613    119739455    119719302    119737144    6    119697613,119713168,119719294,119721484,119728489,119736862,    119697694,119713236,119719794,119722012,119728654,119739455,    Q86SF6_HUMAN    R18909


However, when I query for exon 1 I get that exon 1 starts at index 
119697614 not 119697613:

>hg17_refGene_NM_018234_0 range=chr2:119697614-119697694 5'pad=0 3'pad=0 revComp=FALSE strand=+ repeatMasking=none
GAGGAGGAGCCTCGGGCCGAGCCACCGCCTTCGCCGCGGACCTTCAGCTG
CCGCGGTCGCTCCGAGCGGCGGGCCGCAGAG


So my next question is, are the exonStarts and exonEnds supposed to be 
inclusive or exclusive?  It looks like the indexing scheme is exonRegion 
= (exonStart, exonEnd], is this correct? Also, is it different if the 
gene is on the negative strand?

Thanks,
Todd



More information about the Genome mailing list