[Genome] knownToU133Plus2.txt data

Rachel Harte hartera at soe.ucsc.edu
Thu May 17 11:20:50 PDT 2007


Hello Cheng Huang,

The reason that not all probesets are represented is that this table shows
the best overlapping probeset consensus sequence for each accession from
the knownGene table (UCSC Genes track for hg18, Known Genes for other
assemblies). The knownGene and the affy U133 sequence must both align to
the same strand and best overlap is defined at the level of the aligning
blocks i.e. the most number of bases overlapping in their alignments in
the aligned blocks.

Not all probsets are represented because there may be more than one Affy
U133 sequence that overlaps with each knownGene and only the best
overlapping one is in the table. Also, not all the Affymetrix probset
sequences may align to the genome with the criteria used for the Blat
alignments.

I can easily generate a table with all overlapping probeset sequences for
each knownGene if that would be useful to you. Please let me know if you
would like that.

I hope this helps you. Please let us know if you have further questions.

Rachel

Rachel Harte
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


On Thu, 17 May 2007, Huang, Cheng-Cheng  wrote:

> Hi,
>
>
>
> I download "knownToU133Plus2.txt" data from your web site. The number of
> unique probe set in this data file is 23,902. According to Affymetrix,
> there are about 54,000 probe sets are printed on this chip. Could you
> please explain the difference?
>
>
>
> Thanks
>
>
>
> Cheng Huang
>
>
>
>
>
> _______________________________________________
> Genome maillist  -  Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>


More information about the Genome mailing list