CS 59000 Statistical Machine learning Lecture 3

CS 59000 Statistical Machine learningLecture 3 Yuan (Alan) Qi (alanqi@cs.purdue.edu) Sept. 2008

Review: Bayes’ Theorem posterior  likelihood × prior

The Multivariate Gaussian

Maximum (Log) Likelihood

Maximum Likelihood for Regression Determine by minimizing sum-of-squares error, .

Predictive Distribution by ML

MAP: A Step towards Bayes Determine by minimizing regularized sum-of-squares error, .

Bayesian Curve Fitting

Bayesian Predictive (Posterior) Distribution

Decision Theory Inference step Determine either or . Decision step For given x, determine optimal t.

Minimum Misclassification Rate

Minimum Expected Loss Regions are chosen to minimize

The Squared Loss Function Minimize

(Differential) Entropy For discrete random variables For continuous random variables • Important quantity in • coding theory • statistical physics • machine learning • Measure randomness

The Kullback-Leibler Divergence

Conditional Entropy & Mutual Information

Parametric Distributions Basic building blocks: Need to determine given Representation: or ? Recall Curve Fitting

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution

Binary Variables (2) N coin flips: Binomial Distribution

Binomial Distribution

ML Parameter Estimation for Bernoulli (1) Given:

ML Parameter Estimation for Bernoulli (2) Example: Prediction: all future tosses will land heads up Overfitting to D

Beta Distribution Distribution over .

Beta Distribution

Bayesian Bernoulli The Beta distribution provides the conjugate prior for the Bernoulli distribution.

Prior ∙ Likelihood = Posterior

Properties of the Posterior As the size of the data set, N , increase

Prediction under the Posterior What is the probability that the next coin toss will land heads up? Predictive posterior distribution

Multinomial Variables 1-of-K coding scheme:

ML Parameter estimation Given: Ensure , use a Lagrange multiplier, ¸.

The Multinomial Distribution

The Dirichlet Distribution Conjugate prior for the multinomial distribution.

Bayesian Multinomial (1)

The Gaussian Distribution

Central Limit Theorem The distribution of the sum of N i.i.d. random variables becomes increasingly Gaussian as N grows. Example: N uniform [0,1] random variables.

Geometry of the Multivariate Gaussian

Moments of the Multivariate Gaussian (1) thanks to anti-symmetry of z

Moments of the Multivariate Gaussian (2)

Partitioned Gaussian Distributions

Partitioned Conditionals and Marginals

Bayes’ Theorem for Gaussian Variables Given we have where

Maximum Likelihood for the Gaussian (1) Given i.i.d. data , the log likeli-hood function is given by Sufficient statistics

Maximum Likelihood for the Gaussian (2) Set the derivative of the log likelihood function to zero, and solve to obtain Similarly

Maximum Likelihood for the Gaussian (3) Under the true distribution Hence define

End

Sequential Estimation Contribution of the Nth data point, xN correction given xN correction weight old estimate

The Robbins-Monro Algorithm (1) Consider µ and z governed by p(z,µ) and define the regression function Seek µ? such that f(µ?) = 0.

The Robbins-Monro Algorithm (2) Assume we are given samples from p(z,µ), one at the time.

CS 59000 Statistical Machine learning Lecture 3

CS 59000 Statistical Machine learning Lecture 3

Presentation Transcript

CS 60050 Machine Learning

Machine Learning – Lecture 3

CS 59000 Statistical Machine learning Lecture 16

CS b351 Statistical Learning

CS 461: Machine Learning Lecture 9

CS 59000 Statistical Machine learning Lecture 25

CS 59000 Statistical Machine learning Lecture 24

CS 59000 Statistical Machine learning Lecture 13

CS 59000 Statistical Machine learning Lecture 15

CS 461: Machine Learning Lecture 4

CS 461: Machine Learning Lecture 4

CS 461: Machine Learning Lecture 2

CS 461: Machine Learning Lecture 8

CS 461: Machine Learning Lecture 7

CS 461: Machine Learning Lecture 1

CS 59000 Statistical Machine learning Lecture 7

CS 59000 Statistical Machine learning Lecture 18

CS 59000 Statistical Machine learning Lecture 6

CS 461: Machine Learning Lecture 7

CS 461: Machine Learning Lecture 1

CS 461: Machine Learning Lecture 3

Machine Learning: Lecture 3