[Genome] "_dupX" names for transcript ID duplicates on same chromosomes
Robert Kuhn
kuhn at soe.ucsc.edu
Tue Jun 12 08:36:46 PDT 2007
Hello, again, Micha,
As you suspect, the transcript_id field is generated dynamically by
the Table Browser as a result of your choice of GTF for output. On
the following page in our FAQ:
http://genome.ucsc.edu/FAQ/FAQformat#format4
that field is defined as
"transcript_id value - A globally unique identifier for the predicted
transcript"
so it is necessary for the TB to distinguish the two AK130020 transcripts.
If this does not fully resolve the issue, please let us know.
thanks,
--b0b kuhn
> From gmicha at gmail.com Tue Jun 12 02:45:30 2007
> To: Robert Kuhn <kuhn at soe.ucsc.edu>
> CC: genome at soe.ucsc.edu
> Subject: Re: [Genome] "_dupX" names for transcript ID duplicates on same chromosomes
>
> Hi Robert,
>
> the following is a grep on a file I downloaded from the table browser
> (mRNA track, hg17, GTF export). So, does the table browser somehow
> dynamically generate these _dup names for multiple alignments on the
> same chromosome?
>
> Best, micha.
>
> $ grep AK130020 human_hg17_mRNA_fromUCSC.gtf
> chr1 hg17_all_mrna exon 16385779 16385842
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020";
> chr1 hg17_all_mrna exon 16452373 16452436
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020";
> chr1 hg17_all_mrna exon 16452437 16452447
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020";
> chr1 hg17_all_mrna exon 16452451 16452453
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020";
> chr1 hg17_all_mrna exon 16452454 16452559
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020";
> chr1 hg17_all_mrna exon 17065218 17066019
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020";
> chr1 hg17_all_mrna exon 17067288 17067965
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020_dup1";
> chr1 hg17_all_mrna exon 17068055 17068101
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020_dup1";
> chr1 hg17_all_mrna exon 17068187 17068297
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020_dup1";
> chr1 hg17_all_mrna exon 17068821 17068976
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020_dup1";
> chr1 hg17_all_mrna exon 17071812 17071943
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020_dup1";
> chr1 hg17_all_mrna exon 17072047 17072113
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020_dup1";
> chr1 hg17_all_mrna exon 17072201 17072648
> 0.000000-.gene_i
> d "AK130020"; transcript_id "AK130020_dup1";
>
>
> En/na Robert Kuhn ha escrit:
> > hello, micha,
> >
> > I'm not sure I understand where you are seeing the "_dup" part of
> > transcript_ids. The chr1_mrna table does have two entries for the
> > accession you mention.
> >
> > One of these is definitely a better alignment than the other. Type
> > AK130020 into the position box of the browser and then click on one of
> > them. Then expand your position window to chr1:16513000-17200000.
> > Both transcripts should be highlighted in the Human Mrna track. I
> > know that does not address your bulk processing issues, but I thought
> > I'd mention it.
> >
> > Please let me know where exactly you are seeing the "_dupX"
> > notation and I will try to track it down. By the way, it is
> > not uncommon for mrnas to align in more than one location. the
> > existence of segmental duplications virtually guarantees it.
> >
> > best wishes,
> >
> > --b0b kuhn
> >
> >
> >> From genome-bounces at soe.ucsc.edu Mon Jun 11 15:48:27 2007
> >> Cc: genome at soe.ucsc.edu, Micha Sammeth <micha at sammeth.net>
> >> Subject: [Genome] "_dupX" names for transcript ID duplicates on same
> >> chromosomes
> >>
> >> Hello again,
> >>
> >> thank you, Robert for the explanation on the ESTs, I got it clear now.
> >>
> >> I have another little problem, hopefully the last one for this project:
> >> when I download the (hg17) mRNA track, I get conveniently unique
> >> transcript_id tags - at least on a chromosome level, which is enough
> >> (e.g., "AK130020" and "AK130020_dup1" on chr1). From which field does
> >> the table browser retrieve them? It does not seem to be all_mrna.qName,
> >> nor did I find a field in gbCdnaInfo or other soundy tables that are up
> >> to 1-2 joins away from all_mrna. I would need these "_dupX" identifiers
> >> to map back the CDS info I get from hg17.cds to genomic coordinates.
> >>
> >> Best thanks, micha.
> >> _______________________________________________
> >> Genome maillist - Genome at soe.ucsc.edu
> >> http://www.soe.ucsc.edu/mailman/listinfo/genome
> >>
> >>
>
More information about the Genome
mailing list