CMPS 242 Home Page
Machine Learning
Fall 2002

Manfred K. Warmuth




Projects and posters

Organisational

Class: TTh 2-3:45, Baskin Engineering 165
Office hours: Mo,We 10-11, 331 Baskin Engineering
Prerequisite: CMPS 201, or concurrent enrollment in CMPS 201, or my consent

Inportant web pages

The web page for the Computation Learning Theory conference COLT
(In particular check out the link ``COLT Resources'' in the left column)

Web page on Kernal Machines
 

Links were you can find data

 UCI
 DELVE
 Gunnar's benchmarks
Open problems
----------------------------------------------------------

Summary of lectures
 

Lecture 1
General introduction to Machine Learning,
Halfing Algorithm, on-line mistake bounds, VC dimension

An implementation of the mindreading machine by Rob Schapire:
Simply execute /cse/classes/cmps242/Fall02/race
Choose O for ODD if the computer is to guess your bit
Your bit and score is on the left
Can you beat it without a hard coin?
Can you beat it with a program?
More later about the algorithm he used

Accompaning paper that introduces the on-line model and the
Winnow algorithm ( paper )

Lecture 2
Designing algorithms for a drug discovery problem based on the Halfing Algorithm ( related paper), Learning disjunctions: VC dimensions, various algorithms, a reduction to monotone disjunctions

Homework 1

Lecture 3:
Mistake bounds for the Winnow and
Weighted Majority algorithm ( paper )

Lecture 4:
How to prove relative loss bounds in the expert's framework (paper)
Recent paper on shifting within a small pool of experts (paper), (talk) which introduces the mixing update
Caching application (talk) (paper)

Lecture 5:
Mini Review: Adversary arguments, counting;
Bayes rule, independence, Bayes algorithm and how it fits into
the expert framework

Lecture 6:
Kraft's inequality. Upper and lower bound
on expected code length. Huffman codes. Entropy and relative entropy
Motivation of Bayes rule and the loss update for the expert setting using a relative
entropy to the prior
Motivation of the Fixed Share to Uniform Past update using the squared Euclidean distance

An application to predicting disk idle times (paper)
BIG sample (thanks Ryan) of idle times for developing rules of thumb (cello0 data)
Intro to Homework 2 - The disk spin down problem Homework 2
Cello 1 Data and Good Parameter Values

Lecture 7:
Derivation of updates
Gradient Descent versus the Exponentiated Gradient Algorithm ( paper)

Lecture 8:
 Background on boosting
 Ada-boost (Tutorial)  (Confidence-Rated Predictions Paper)
( Example report on disk-spindown hw)

Lecture 9:
Boosting ubalanced data sets
- Use constant hypothesis
- Use different costs for pos. and neg. examples
Adaboost as approximate entropy projection (paper)
Corrective and Totally Corrective algorithms
Active Learning using Boosting

Homework 3

Lecture 10:
Optimization Theory
- Lagrangians
- Duality

Classical text:
David G. Luenberger: Linear and Non-linear Programming

Text with a short summary on Optimization Theory:
Cristianini and John Shawe-Taylor: Suport Vector Machines

Food for thought: Last year's projects

Lecture 11
Expansion into feature space
Examples of Kernels
- Why are Kernels efficient?
- Mercer's Theorem
Examples for computing duals

Lecture 12
Solution to Homework 1 ( paper containing solution to last problem))
Discussion answers to Homework 2 ( directory with all reports))
Finish examples for computing duals from last lecture
Answer question re Homework 3

Lecture 13
Support Vector Machines
- Conversion to a quadradic optimization problem
- The dual optimization problem
- The non-seperable case

Lecture 14
Voted Perceptron) versus SVM
Discussion of projects
- One page summary due next Tu
- Updated writeup of ideas (Hw 4)
Inference:
- Maximum Likelihood (ML)
- Maximum A Posteriori (MAP)
- Posterior Mean estimate (PME)
EM algorithm for ML with hidden variables
- Motivation of update in KW framework
- Example applications
- Nice tutorial on EM for HMMs and Gaussian Mixtures

Lecture 15
Framework for deriving updates
Implicit updates

Lecture 16
- Computing derivatives of likelihoods for HMMs
- Batch and on-line entropic updates for HMMs
- Computing usage via dynamic programming (See also tutorial on EM)
- Paper on entropic updates for Gaussian Mixtures

- Application to speech
- Viterbi versus Bush-derby

Short one-problem homework due next Th Homework 5

Lecture 17
Vicinity Lemmas
Kernels defined by digraphs
- Special case of series parallel digraphs
Predicting as well as the best pruning (trees) (graphs)

Using kernels with multiplicative updates (paper)

Lecture 18
Geometric versus informations theoretic updates (slides)
- Differential update
- Canonical descrete update as Euler discretization
- Euler versus forward Euler discretization
- Bregman divergences and their properties
- Motivation of updates using Bregman divergences
- Matching loss functions as Bregman divergences
- Hinge loss for motivating updates for linear threshold functions
- Bregman projection and the Pythagorean Theorem
- The Kernel trick and the Hadamard counter example
- Bregman divergences as relative entropies between exponential distributions
- Duality for Bregman divergences
- Gametheoretic motivation of on-line algorithms
- The Last-step Minimax Algorithm

Lecture 19
Generalization
- Bounds on expected loss on last example
- PAC bounds for the finite case

Lecture 20
Sample complexity bounds for the PAC model i.t.o.
- VC dimension
- Size of compression scheme
Proof of the bound for compression schemes
Sauer's Lemma
The open problem

Overview of the whole class







----------------------------------------------------------
Back to the CE / CS Class Home Pages
Back to the CE / CS Home Page
Last modified Wednesday, 05-Apr-2000 23:13:17 PDT
Open problems