4. Maximum Likelihood

4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.

Learning Probability Distributions. • Learn the likelihood functions and priors from datasets. • Two Main Strategies. Parametric and Non-Parametric. • This Lecture and the next will concentrate on Parametric methods. (This assumes a parametric form for the distributions).

Maximum Likelihood Estimation. Assume distribution is of form • Independent Identically Distributed (I.I.D.) samples; • Choose

Supervised versus Unsupervised Learning. • Supervised Learning assumes that we known the class label for each datapoint. • I.e. We are given pairs • where is the datapoint and is the class label. • Unsupervised Learning does not assume that the class labels are specified. This is a harder task. • But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians). • Stat 231 is almost entirely concerned with supervised learning.

Example of MLE. • One-Dimensional Gaussian Distribution: • Solve for by differentiation:

MLE • The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data. • More usually, algorithms are required. • Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.

MLE and Kullback-Leibler • What happens if the data is not generated by the model that we assume? • Suppose the true distribution is and our models are of form • The Kullback-Leiber divergence is: • This is • K-L is a measure of the difference between

MLE and Kullback-Leibler • Samples • Approximate • By the empirical KL: • Minimizing the empirical KL is equivalent to MLE. • We find the distribution of form

MLE example We denote the log-likelihood as a function of q q* is computed by solving equations For example, the Gaussian family gives close form solution.

Learning with a Prior. • We can put a prior on the parameter values • We can estimate this recursively (if samples are i.i.d): • Bayes Learning: estimate a probability distribution on

Recursive Bayes Learning

4. Maximum Likelihood

4. Maximum Likelihood

Presentation Transcript

Maximum likelihood estimation

Maximum Likelihood Estimation

Maximum likelihood (ML)

Maximum Likelihood

Maximum Likelihood

4. Maximum Likelihood

Maximum Likelihood

Maximum Likelihood Estimation

Maximum Likelihood Estimation

Maximum Likelihood

Maximum likelihood

Maximum likelihood decoding

Maximum likelihood (cont.)

Maximum Likelihood Estimation

Maximum likelihood (cont.)

Maximum Likelihood

Maximum Likelihood

Maximum Likelihood

Maximum Likelihood Estimation

Maximum Likelihood Estimation

Maximum Likelihood Estimate

Maximum Likelihood Detection