[Genome] Questions about indexing in the .fa files
Todd Riley
triley at ias.edu
Sat Jan 6 15:02:52 PST 2007
Hello,
I have some questions about the indexing in the *.fa files, which
unfortunately I could not find in the docs. First let me thank you
kindly for your answers...
1. Is the first nucleotide in a *.fa file index 0 or 1?
2. Next I am a bit confused about indexes found in the known genes table
compared to what queries return:
Let's take a positive strand gene like NM_018234 in hg17, the known gene
table gives this info:
#name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds proteinID alignID
NM_018234 chr2 + 119697613 119739455 119719302 119737144 6 119697613,119713168,119719294,119721484,119728489,119736862, 119697694,119713236,119719794,119722012,119728654,119739455, Q86SF6_HUMAN R18909
However, when I query for exon 1 I get that exon 1 starts at index
119697614 not 119697613:
>hg17_refGene_NM_018234_0 range=chr2:119697614-119697694 5'pad=0 3'pad=0 revComp=FALSE strand=+ repeatMasking=none
GAGGAGGAGCCTCGGGCCGAGCCACCGCCTTCGCCGCGGACCTTCAGCTG
CCGCGGTCGCTCCGAGCGGCGGGCCGCAGAG
So my next question is, are the exonStarts and exonEnds supposed to be
inclusive or exclusive? It looks like the indexing scheme is exonRegion
= (exonStart, exonEnd], is this correct? Also, is it different if the
gene is on the negative strand?
Thanks,
Todd
More information about the Genome
mailing list