Organisational
Class: TTh 2-3:45, Baskin Engineering 165
Office hours: Mo,We 10-11, 331 Baskin Engineering
Prerequisite: CMPS 201, or concurrent enrollment in CMPS 201, or my consent
Inportant web pages
The web page for the Computation Learning Theory conference COLT
(In particular check out the link ``COLT Resources'' in the left column)
Web page on Kernal Machines
Links were you can find data
UCI
DELVE
Gunnar's benchmarks
Open
problems
----------------------------------------------------------
Summary of lectures
Lecture 1
General introduction to Machine Learning,
Halfing Algorithm,
on-line mistake bounds, VC dimension
An implementation of the mindreading machine
by Rob Schapire:
Simply execute /cse/classes/cmps242/Fall02/race
Choose O for ODD if the computer is to guess your bit
Your bit and score is on the left
Can you beat it without a hard coin?
Can you beat it with a program?
More later about the algorithm he used
Accompaning paper that introduces the on-line model and the
Winnow algorithm ( paper
)
Lecture 2
Designing algorithms for
a drug discovery problem based on the
Halfing Algorithm
( related paper),
Learning disjunctions: VC dimensions,
various algorithms, a reduction to monotone
disjunctions
Lecture 3:
Mistake bounds for the Winnow and
Weighted Majority algorithm ( paper
)
Lecture 4:
How to prove relative loss bounds in the expert's framework (paper)
Recent paper on shifting within a small
pool of experts
(paper),
(talk)
which introduces the mixing update
Caching application
(talk)
(paper)
Lecture 5:
Mini Review: Adversary arguments, counting;
Bayes rule, independence, Bayes algorithm and how it fits into
the expert framework
Lecture 6:
Kraft's inequality. Upper and lower bound
on expected code length. Huffman codes. Entropy and relative entropy
Motivation of Bayes rule and the loss
update for the expert setting using a relative
entropy to the prior
Motivation of the Fixed Share to Uniform Past update
using the squared Euclidean distance
An application to predicting disk idle
times
(paper)
BIG sample (thanks Ryan) of idle times
for developing rules of thumb
(cello0 data)
Intro to Homework 2 - The disk spin down
problem
Homework 2
Cello 1 Data
and Good Parameter Values
Lecture 7:
Derivation of updates
Gradient Descent versus the Exponentiated Gradient Algorithm ( paper)
Lecture 8:
Background on boosting
Ada-boost
(Tutorial)
(Confidence-Rated
Predictions Paper)
(
Example report on disk-spindown hw)
Lecture 9:
Boosting ubalanced data sets
- Use constant hypothesis
- Use different costs for pos. and neg. examples
Adaboost as approximate entropy projection
(paper)
Corrective and Totally Corrective algorithms
Active Learning using Boosting
Lecture 10:
Optimization Theory
- Lagrangians
- Duality
Classical text:
David G. Luenberger: Linear and Non-linear
Programming
Text with a short summary on Optimization
Theory:
Cristianini and John Shawe-Taylor:
Suport Vector Machines
Food for thought:
Last year's projects
Lecture 11
Expansion into feature space
Examples of Kernels
- Why are Kernels efficient?
- Mercer's Theorem
Examples for computing duals
Lecture 12
Solution to Homework 1
( paper containing solution to last problem))
Discussion answers to Homework 2
( directory with all reports))
Finish examples for computing duals from last lecture
Answer question re Homework 3
Lecture 13
Support Vector Machines
- Conversion to a quadradic optimization
problem
- The dual optimization problem
- The non-seperable case
Lecture 14
Voted Perceptron)
versus SVM
Discussion of projects
- One page summary due next Tu
- Updated writeup of ideas
(Hw 4)
Inference:
- Maximum Likelihood (ML)
- Maximum A Posteriori (MAP)
- Posterior Mean estimate (PME)
EM algorithm for ML with hidden variables
- Motivation of update in KW framework
- Example applications
- Nice tutorial
on EM for HMMs and Gaussian Mixtures
Lecture 15
Framework for deriving updates
Implicit updates
Lecture 16
- Computing derivatives of likelihoods for HMMs
- Batch and on-line entropic updates for HMMs
- Computing usage via dynamic programming
(See also tutorial on EM)
- Paper on entropic updates for
Gaussian Mixtures
- Application to speech
- Viterbi versus Bush-derby
Short one-problem homework due next Th Homework 5
Lecture 17
Vicinity Lemmas
Kernels defined by digraphs
- Special case of series parallel digraphs
Predicting as well as the best pruning
(trees)
(graphs)
Using kernels with multiplicative updates
(paper)
Lecture 18
Geometric versus informations theoretic
updates
(slides)
- Differential update
- Canonical descrete update as Euler
discretization
- Euler versus forward Euler discretization
- Bregman divergences and their properties
- Motivation of updates using Bregman
divergences
- Matching loss functions as Bregman
divergences
- Hinge loss for motivating updates for linear threshold
functions
- Bregman projection and the Pythagorean
Theorem
- The Kernel trick and the Hadamard counter
example
- Bregman divergences as relative entropies
between exponential distributions
- Duality for Bregman divergences
- Gametheoretic motivation of on-line
algorithms
- The Last-step Minimax Algorithm
Lecture 19
Generalization
- Bounds on expected loss on last example
- PAC bounds for the finite case
Lecture 20
Sample complexity bounds for the PAC model i.t.o.
- VC dimension
- Size of compression scheme
Proof of the bound for compression schemes
Sauer's Lemma
The open problem
Overview of the whole class
----------------------------------------------------------
Back to the CE / CS Class
Home Pages
Back to the CE / CS Home Page
Last modified Wednesday, 05-Apr-2000 23:13:17 PDT
Open problems