[Genome] Table Browser / Why are there LocusLink ID denoted genes on different strands/chromosomes?

Ann Zweig ann at soe.ucsc.edu
Fri Jan 5 16:42:49 PST 2007


Hello Anton,

     You appear to be most of the way there, you just need a little help 
understanding
the data.  I will address each of your two questions.

1. Gene of different chromosomes:
55344    NM_018390,NM_018390,    chrX,chrY,    +,+,    132991,132991,
160020,160020,

     In this case, it looks like the gene appears on both the X and Y chromosomes;
probably in the shared section.  The Y chromosome in this assembly contains two
pseudoautosomal regions (PARs).  One of them is located at chrY:1-2692881.


2. Gene on different strands:
9084    NM_181880,NM_181880,    chrY,chrY,    -,+,    14535783,14606232,
14536519,14606968,

     It is not uncommon to see a copy of the gene on the opposite strand of the 
same
chromosome.  This gene happens to be in a region of inverted repeat.  You can 
see that
by viewing the Segmental Duplication track in this location.

	I hope this helps you understand the underlying data.

Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu



Anton Kratz wrote:
> Hi,
> 
> for a bioinformatics project I am working on I want to make a list of all
> genes (*) in the human genome (hg17, May 2004, NCBI Build 35) with these
> entries:
> 1. some unique number identifying the gene
> 2. strand
> 3. chromosome
> 4. start
> 5. end
> 
> As a first step I got a list through the Table Browser and it has these
> entries:
> 
> LocusLink ID, Known Gene ID, chromosome, strand, start, end
> 
> (LocusLink is included b/c later I want to access via LocusLink not Known
> Gene).
> 
> This list has around 34000 entries (lines) in total and 17000 unique
> LocusLink IDs, b/c many LocusLink IDs occur multiple times, and it looks
> likes this:
> 
> #hg17.knownToLocusLink.value    hg17.knownGene.name    hg17.knownGene.chrom
> hg17.knownGene.strand    hg17.knownGene.txStart    hg17.knownGene.txEnd
> 
> [...]
> 
> 83259    NM_032971    chrY    +    4911627    5016846
> 83259    NM_032972    chrY    +    4967491    5016846
> 83259    NM_032973    chrY    +    4967491    5653623
> 
> [...]
> 
> This is almost what I want, b/c I can concatenate such multiple entries as
> the example above to a new entry like this:
> 83259    chrY    +    4911627    5653623
> 
> And this would be my "gene". 4911627 b/c it is the minimum in this example
> and 5653623 b/c it is the maximum.
> 
> But I have difficulties understanding many of the entries the UCSC Known
> Genes list I got through the Table Browser. F.e.:
> 
> 1. Gene of different chromosomes:
> 55344    NM_018390,NM_018390,    chrX,chrY,    +,+,    132991,132991,
> 160020,160020,
> 
> 2. Gene on different strands:
> 9084    NM_181880,NM_181880,    chrY,chrY,    -,+,    14535783,14606232,
> 14536519,14606968,
> 
> Why are there LocusLink ID denoted genes on different strands/chromosomes?
> 
> Best,
> Anton
> 
> 
> (*) when I use the term "gene", I do not mean gene in an "true" biological
> sense, I do not want to distinguish between alternative splice variants etc.
> For my program, a gene needs to be s/th which has a exactly one
> startpostion, one enposition, and does only occur once in the genome. Yes I
> admit that's an oversimplification.
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome


More information about the Genome mailing list