1 / 11

4. Maximum Likelihood

4. Maximum Likelihood. Prof. A.L. Yuille Stat 231. Fall 2004. Learning Probability Distributions. Learn the likelihood functions and priors from datasets. Two Main Strategies. Parametric and Non-Parametric. This Lecture and the next will concentrate on Parametric methods.

damisi
Download Presentation

4. Maximum Likelihood

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.

  2. Learning Probability Distributions. • Learn the likelihood functions and priors from datasets. • Two Main Strategies. Parametric and Non-Parametric. • This Lecture and the next will concentrate on Parametric methods. (This assumes a parametric form for the distributions).

  3. Maximum Likelihood Estimation. Assume distribution is of form • Independent Identically Distributed (I.I.D.) samples; • Choose

  4. Supervised versus Unsupervised Learning. • Supervised Learning assumes that we known the class label for each datapoint. • I.e. We are given pairs • where is the datapoint and is the class label. • Unsupervised Learning does not assume that the class labels are specified. This is a harder task. • But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians). • Stat 231 is almost entirely concerned with supervised learning.

  5. Example of MLE. • One-Dimensional Gaussian Distribution: • Solve for by differentiation:

  6. MLE • The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data. • More usually, algorithms are required. • Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.

  7. MLE and Kullback-Leibler • What happens if the data is not generated by the model that we assume? • Suppose the true distribution is and our models are of form • The Kullback-Leiber divergence is: • This is • K-L is a measure of the difference between

  8. MLE and Kullback-Leibler • Samples • Approximate • By the empirical KL: • Minimizing the empirical KL is equivalent to MLE. • We find the distribution of form

  9. MLE example We denote the log-likelihood as a function of q q* is computed by solving equations For example, the Gaussian family gives close form solution.

  10. Learning with a Prior. • We can put a prior on the parameter values • We can estimate this recursively (if samples are i.i.d): • Bayes Learning: estimate a probability distribution on

  11. Recursive Bayes Learning

More Related