[Genome] need your help to update hapmap browser: refGene.txt

Kayla Smith kayla at soe.ucsc.edu
Tue Apr 10 14:07:35 PDT 2007



Marcela,

1.  On the details page for the refSeq track you'll find the following text:

Methods

RefSeq mRNAs were aligned against the human genome using blat; those
with an alignment of less than 15% were discarded. When a single mRNA
aligned in multiple places, the alignment having the highest base
identity was identified. Only alignments having a base identity level
within 0.1% of the best and at least 96% base identity with the genomic
sequence were kept.

2.  Here's a link to the FAQ that describes the off-by-one that you are
seeing:

http://genome.ucsc.edu/FAQ/FAQtracks#tracks1

3.  It looks from the data you've pasted, and checking the table
myself, that txEnd = the last item in the exonEnds list, i.e. 31871809
Perhaps you are reading from the wrong column?  I see that the bin 
column value (103) is appended to the end of the header row in what you 
pasted, perhaps making the columns appear out of register?

4.  The first exonStart/exonEnd pair would be:

exonStart: 31787616
exonEnd: 31787804

That size is 189 base pairs.  I got that from taking the first item in 
the exonStart comma separated list, and the first item in the exonEnd 
comma separated list.  I'm not sure where you got the value 84,188 bases 
  that you mentioned in your question.  Perhaps you are doing something 
different with the data.  This region, chr13:31,787,616-31,787,804 is an 
exon but it is a UTR.

I hope this helps explain the data in hg18.refGene for you.  Please 
don't hesitate to contact us again if you have any further questions.

Kayla Smith
UCSC Genome Bioinformatics Group


Marcela K. Tello-Ruiz wrote:
> Dear Colleagues,
> 
> I am trying to understand the data in your refGene.txt files to update the refGenes coordinates in the HapMap browser.
> 
> A few questions follow:
> 
> - Where do you guys get your data for genomic coordinates for refGenes?
> 
> - From random spotting, there seems to be a consistent +1 shift between the txStart coordinates in your Tables and the Genome Browser for refGenes.  What is the rationale for doing this?  Perhaps 0-base vs 1-base Txpt start?
> TxStart+1 seems to be the start coordinate for the gene at NCBI.
> 
> - From a few examples, the txEnd is consistent with the last element in the exonEnds column, but this is not the case for BRCA2 (see record below, taken from refGene.txt, hg18), any particular reason?
> 
> - Also for the BRCA2 gene example (genomic span = 84,193 bases)... Does the first exonStart-exonEnd pair (84,188 bases) represent a real exon? 
> 
> 
> #bin	name	chrom	strand	txStart	txEnd	cdsStart	cdsEnd	exonCount	exonStarts	exonEnds	id	name2	cdsStartStat	cdsEndStat	exonFrames
> 103	NM_000059	chr13	+	31787616	31871809	31788597	31870907	27	31787616,31788558,31791213,31797212,31798237,31798378,31798635,31801579,31803055,31804408,31808401,31816694,31818963,31826997,31828564,31829878,31834659,31835315,31842538,31843092,31848806,31851453,31851886,31852143,31866825,31869034,31870298,	31787804,31788664,31791462,31797321,31798287,31798419,31798750,31801629,31803167,31805524,31813333,31816790,31819033,31827425,31828746,31830066,31834830,31835670,31842694,31843237,31848928,31851652,31852050,31852282,31867070,31869181,31871809,	0	BRCA2	cmpl	cmpl	-1,0,1,1,2,1,0,1,0,1,1,1,1,2,1,0,2,2,0,0,1,0,1,0,1,0,0,
> 
> Thanks a lot for your help, on behalf of HapMap.Org :)
> 
> Marcela
> 
> 
> 
> 



More information about the Genome mailing list