[Genome] need your help to update hapmap browser: refGene.txt
Kayla Smith
kayla at soe.ucsc.edu
Tue Apr 10 14:07:35 PDT 2007
Marcela,
1. On the details page for the refSeq track you'll find the following text:
Methods
RefSeq mRNAs were aligned against the human genome using blat; those
with an alignment of less than 15% were discarded. When a single mRNA
aligned in multiple places, the alignment having the highest base
identity was identified. Only alignments having a base identity level
within 0.1% of the best and at least 96% base identity with the genomic
sequence were kept.
2. Here's a link to the FAQ that describes the off-by-one that you are
seeing:
http://genome.ucsc.edu/FAQ/FAQtracks#tracks1
3. It looks from the data you've pasted, and checking the table
myself, that txEnd = the last item in the exonEnds list, i.e. 31871809
Perhaps you are reading from the wrong column? I see that the bin
column value (103) is appended to the end of the header row in what you
pasted, perhaps making the columns appear out of register?
4. The first exonStart/exonEnd pair would be:
exonStart: 31787616
exonEnd: 31787804
That size is 189 base pairs. I got that from taking the first item in
the exonStart comma separated list, and the first item in the exonEnd
comma separated list. I'm not sure where you got the value 84,188 bases
that you mentioned in your question. Perhaps you are doing something
different with the data. This region, chr13:31,787,616-31,787,804 is an
exon but it is a UTR.
I hope this helps explain the data in hg18.refGene for you. Please
don't hesitate to contact us again if you have any further questions.
Kayla Smith
UCSC Genome Bioinformatics Group
Marcela K. Tello-Ruiz wrote:
> Dear Colleagues,
>
> I am trying to understand the data in your refGene.txt files to update the refGenes coordinates in the HapMap browser.
>
> A few questions follow:
>
> - Where do you guys get your data for genomic coordinates for refGenes?
>
> - From random spotting, there seems to be a consistent +1 shift between the txStart coordinates in your Tables and the Genome Browser for refGenes. What is the rationale for doing this? Perhaps 0-base vs 1-base Txpt start?
> TxStart+1 seems to be the start coordinate for the gene at NCBI.
>
> - From a few examples, the txEnd is consistent with the last element in the exonEnds column, but this is not the case for BRCA2 (see record below, taken from refGene.txt, hg18), any particular reason?
>
> - Also for the BRCA2 gene example (genomic span = 84,193 bases)... Does the first exonStart-exonEnd pair (84,188 bases) represent a real exon?
>
>
> #bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds id name2 cdsStartStat cdsEndStat exonFrames
> 103 NM_000059 chr13 + 31787616 31871809 31788597 31870907 27 31787616,31788558,31791213,31797212,31798237,31798378,31798635,31801579,31803055,31804408,31808401,31816694,31818963,31826997,31828564,31829878,31834659,31835315,31842538,31843092,31848806,31851453,31851886,31852143,31866825,31869034,31870298, 31787804,31788664,31791462,31797321,31798287,31798419,31798750,31801629,31803167,31805524,31813333,31816790,31819033,31827425,31828746,31830066,31834830,31835670,31842694,31843237,31848928,31851652,31852050,31852282,31867070,31869181,31871809, 0 BRCA2 cmpl cmpl -1,0,1,1,2,1,0,1,0,1,1,1,1,2,1,0,2,2,0,0,1,0,1,0,1,0,0,
>
> Thanks a lot for your help, on behalf of HapMap.Org :)
>
> Marcela
>
>
>
>
More information about the Genome
mailing list