160 likes | 423 Views
4. Maximum Likelihood. Prof. A.L. Yuille Stat 231. Fall 2004. Learning Probability Distributions. Learn the likelihood functions and priors from datasets. Two Main Strategies. Parametric and Non-Parametric. This Lecture and the next will concentrate on Parametric methods.
E N D
4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
Learning Probability Distributions. • Learn the likelihood functions and priors from datasets. • Two Main Strategies. Parametric and Non-Parametric. • This Lecture and the next will concentrate on Parametric methods. (This assumes a parametric form for the distributions).
Maximum Likelihood Estimation. Assume distribution is of form • Independent Identically Distributed (I.I.D.) samples; • Choose
Supervised versus Unsupervised Learning. • Supervised Learning assumes that we known the class label for each datapoint. • I.e. We are given pairs • where is the datapoint and is the class label. • Unsupervised Learning does not assume that the class labels are specified. This is a harder task. • But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians). • Stat 231 is almost entirely concerned with supervised learning.
Example of MLE. • One-Dimensional Gaussian Distribution: • Solve for by differentiation:
MLE • The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data. • More usually, algorithms are required. • Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.
MLE and Kullback-Leibler • What happens if the data is not generated by the model that we assume? • Suppose the true distribution is and our models are of form • The Kullback-Leiber divergence is: • This is • K-L is a measure of the difference between
MLE and Kullback-Leibler • Samples • Approximate • By the empirical KL: • Minimizing the empirical KL is equivalent to MLE. • We find the distribution of form
MLE example We denote the log-likelihood as a function of q q* is computed by solving equations For example, the Gaussian family gives close form solution.
Learning with a Prior. • We can put a prior on the parameter values • We can estimate this recursively (if samples are i.i.d): • Bayes Learning: estimate a probability distribution on