[Genome] multiple entries for a gene in refFlat table
Anil Jegga
Anil.Jegga at cchmc.org
Mon Jun 11 16:35:06 PDT 2007
Hi Archana
This may not be just the issue of blat. If you look at this region,
there are several sequencing gaps (see the attached pdf). Most probably
there are errors with the contig assembly in this part of the genome
(with several segments repeating) and that could be leading to these
multiple hits. If you blat the mRNA sequence all the high scoring hits
are on chr 15 (identity ranging from 94.7% to 100%) within
chr15:19,027,714-26,577,158 (about 7.5 mbp). Or are these "real"
segmental duplications?
Thanks
Anil
Anil Jegga
Assistant Professor
Department of Pediatrics and Division of Biomedical Informatics
Cincinnati Children's Hospital Medical Center and University of
Cincinnati
Tel: (513)-636-0261
Fax: (513)-636-2056
http://anil.cchmc.org
>>> Archana Thakkapallayil <archanat at soe.ucsc.edu> 06/11/07 7:01 PM
>>>
Hello Amit,
When I search for 'GOLGA8G' in the Genome Browser, I get these three
hits:
GOLGA8G at chr15_random:258251-271612 - (NM_001012420) golgi
autoantigen, golgin subfamily a, 8G
GOLGA8G at chr15:26563798-26577158 - (NM_001012420) golgi autoantigen,
golgin subfamily a, 8G
GOLGA8G at chr15:26297405-26310766 - (NM_001012420) golgi autoantigen,
golgin subfamily a, 8G
Here is a previously answered mailing list question which is similar to
yours:
http://www.soe.ucsc.edu/pipermail/genome/2007-May/013623.html
Hope this is helpful to you. Please don't hesitate to contact us again
if you require further assistance.
Regards,
Archana
UCSC Genome Bioinformatics Group
Amit U Sinha wrote:
> For human build hg18, refFlat table has multiple entries for some
genes.
> Eg for gene GOLGA8G (NCBI GeneID: 283768), the following entries
exist:
>
> geneName name chrom strand txStart txEnd
> GOLGA8G NM_001012420 chr15 + 26297404 26310766
> GOLGA8G NM_001012420 chr15 - 26563797 26577158
> GOLGA8G NM_001012420 chr15_random + 258250 271612
>
> 1. What does chr15_random indicate when its physical location is
known
> 2. NCBI website shows only a single transcript, why is the transcript
> NM_00102420 repeated twice, with different start positions?
>
> Thanks,
> -- amit
> ____________________________________________________________________
> Amit U Sinha
> Graduate Student
> Univ of Cincinnati
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
More information about the Genome
mailing list