[Genome] Table Browser / Why are there LocusLink ID denoted genes on different strands/chromosomes?
Anton Kratz
anton.kratz at googlemail.com
Fri Jan 5 00:23:02 PST 2007
Hi,
for a bioinformatics project I am working on I want to make a list of all
genes (*) in the human genome (hg17, May 2004, NCBI Build 35) with these
entries:
1. some unique number identifying the gene
2. strand
3. chromosome
4. start
5. end
As a first step I got a list through the Table Browser and it has these
entries:
LocusLink ID, Known Gene ID, chromosome, strand, start, end
(LocusLink is included b/c later I want to access via LocusLink not Known
Gene).
This list has around 34000 entries (lines) in total and 17000 unique
LocusLink IDs, b/c many LocusLink IDs occur multiple times, and it looks
likes this:
#hg17.knownToLocusLink.value hg17.knownGene.name hg17.knownGene.chrom
hg17.knownGene.strand hg17.knownGene.txStart hg17.knownGene.txEnd
[...]
83259 NM_032971 chrY + 4911627 5016846
83259 NM_032972 chrY + 4967491 5016846
83259 NM_032973 chrY + 4967491 5653623
[...]
This is almost what I want, b/c I can concatenate such multiple entries as
the example above to a new entry like this:
83259 chrY + 4911627 5653623
And this would be my "gene". 4911627 b/c it is the minimum in this example
and 5653623 b/c it is the maximum.
But I have difficulties understanding many of the entries the UCSC Known
Genes list I got through the Table Browser. F.e.:
1. Gene of different chromosomes:
55344 NM_018390,NM_018390, chrX,chrY, +,+, 132991,132991,
160020,160020,
2. Gene on different strands:
9084 NM_181880,NM_181880, chrY,chrY, -,+, 14535783,14606232,
14536519,14606968,
Why are there LocusLink ID denoted genes on different strands/chromosomes?
Best,
Anton
(*) when I use the term "gene", I do not mean gene in an "true" biological
sense, I do not want to distinguish between alternative splice variants etc.
For my program, a gene needs to be s/th which has a exactly one
startpostion, one enposition, and does only occur once in the genome. Yes I
admit that's an oversimplification.
More information about the Genome
mailing list