[Genome] Promoters? Match to RefSeq (insignificant???)

Kosi Gramatikoff kosi at burnham.org
Wed Jan 3 09:11:34 PST 2007


In one of your REAADME files at:
ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/
 
I read this:
    "Sequences 1000 bases upstream of annotated
    transcription start of RefSeq genes.  This includes only the
    cases where the transcription start (TSS) is annotated separately
    from the coding region start.  Note that upstream files are
    generated only when an assembly is released. Therefore, the
    data may be slightly out of synch with the RefSeq data in
    assemblies that are incrementally updated nightly."


1. I would like to know how were the TSS annotated - what is the principle
of this annotation - identification of the TSS???

2. Do you have references describing the presumed principle of TSS
identification? RefSeq does not contain that...

3. What do you mean by: "slightly out of synch with the RefSeq"?
If TSSs are annotated separately from the start of the coding region - how
could they be ever in sync. TSS is upstream of the ATG codon (see next
sentence as evidence).

The above questions are driven by a specific observation.
I compared the entire RefSeq (~26,000 5'UTR) with your upstream2000 set
(~23,000) and found only 244 RefSeq 5'UTRs to be included entirely. If 244
upstream sequences (promoters) contain sequences down to the ATG why the
rest  of the promoters do not contain that?

How exactly was the association of a particular promoter (upstream region)
matched to unique RefSeq entry?

Is there information for how many nucleotides are missing from the ATG (or
in between the ATG and your upstream sequence)?


Please let me know,
Kosi Gramatikoff 
Burnham Institute for Medical Research
La Jolla, CA

PS. If there is a person (contact info, phone) please let me know I would
like to discuss the above issues with a live feedback




More information about the Genome mailing list