Class: TTh 4-5:45, Oakes 106
Office hours: Mo,We 10-11, E2-357
Prerequisite: CMPS 242 - Machine Learning
or a grad class in Bayesian Statistics
Textbook by Nir Friedman and Daphna Koller
Summary of lectures
1: General introduction
Derivaton of Bayes rule
Random vars
Iterative application of Bayes' rule
2: Independence and chain rule for random vars
Generalizations to continous densities
Expectations and variances
Entropy, conditional entropy, mutual information, relative entropy
Bayesian network representation
naive Bayes
3: Factorization from dag
Causal reasoning
Local Markov assumptions
I-maps, d-seperation
Tutorial by Friedman
Homework 1, Due Tu 1-24 in class
Solutions for homework 1
Correction to Part 2 of Exercise 3.3:
- replace ``burglary'' by ``earthquake'' in line 2
- replace ``P(b^1|c^1,e^1)'' by ``P(b^1|a^1,e^1)'' in line 4
- ignore ending sentence in lines 4 and 5
4 More on d-seperaton
I-maps, minimal I-maps, P-maps
Example that directed graphs can't do
5 Good representations for local CPDs
Trees, rule systems, contextual independence
6 Contextual independece
Generalize linear models
Noisy or and softmax
Exponential family of distributions
7 Undirected graphical models
Independencies based on seperability
Factorization, minimal I-maps, P-maps
From Bayesian Nets to Markov nets
Homework 2, Due Tu 2-7 in class
Solutions for homework 2
8 From Markov nets fo Believe nets
Chordal graphs
A blend between max and sum with ring operations
Application to speech
9 Sum Product Variable Elimination algorithm
10 Clique tree sum product and believe propagtion algs
Homework 3, Due Tu 2-21 in class
Solutions for homework 3
11 Discussion of possible projects
Construction of clique trees
Sampling: Hoeffding and Chenoff bound
12 Markov chains
Gibbs sampling
13 Reversible markov chain
Metropolis Hasting Alg Tutorial
Inference as optimizing an
energy functional
14 Relative entropy both ways via
problem of mixing two distributions
Visualizations of the relative entropy
as a function of both arguments
Deriving Bayes rule by maximizing an energy functional
Matrix generalizations and apps to graphics
Alexa's original paper Erratum
Deriving the message passing algorithm via optimization
15 Cluster graphs and GBP on such graphs
Bethe approximation
16 Deriving the algs again
GBP in practice
Mean field approximation
17 Learning graphical models
MLE - decomposition - multinomial example
Bayesian learning - pseudo counts
Alg. in Sec. 4 has lower worst-case regret than pseudocounts algs
Expected regret bounds for Krichevsky-Trifimov pseudo count in Sec. 4
Homework 4, Due Tu 3-14 in class
Solutions for homework 4
18 Wrap up Bayesian learning
Learning structure
19 Hidden variables and training with EM
20 Total Boost
Boosting as a game
Paper with Pythagorean Thm in appendix
21 A Bayes rule for positive definite matrices
Talk Paper