1 / 19

Bayesian Inference

Bayesian Inference. Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013. Outline. Probability distributions Maximum likelihood estimation Maximum a posteriori estimation Conjugate priors Conceptualizing models as collection of priors Noninformative priors Empirical Bayes.

zenia
Download Presentation

Bayesian Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013

  2. Outline • Probability distributions • Maximum likelihood estimation • Maximum a posteriori estimation • Conjugate priors • Conceptualizing models as collection of priors • Noninformative priors • Empirical Bayes

  3. Probability distribution • Density estimation – to model distribution p(x)of a random variable x given a finite set of observations x1, …, xN. Nonparametric approach Parametric approach • Histogram • Kernel density estimation • Nearest neighbor approach • Gaussian distribution • Beta distribution • …

  4. The Exponential Family Gaussian distribution Binomial distribution Beta distribution etc…

  5. Gaussian distribution • Central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed Bean machine by Sir Francis Galton

  6. Maximum likelihood estimation • The frequentist approach to estimate parameters of the distribution given a set of observations is to maximize likelihood. – data are i.i.d – monotonic transformation

  7. MLE for Gaussian distribution – simple average

  8. Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information.

  9. MAP for Gaussian distribution Posterior distribution is given by – weighted average

  10. Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

  11. MLE for Binomial distribution • Binomial distribution models the probability of m “heads” out of N tosses. • The only parameter of the distribution μ encodes probability of a single event (“head”) • Maximum likelihood estimation is given by

  12. MAP for Binomial distribution • The conjugate prior for this distribution is Beta • The posterior is then given by where l = N – m, simply the number of “tails”.

  13. Models as collection of priors - 1 • Take a simple regression model • Add a prior on weights • And get Bayesian linear regression!

  14. Models as collection of priors - 2 yn yn • Take again a simple regression model β β Where yn is some function of xn • Add a prior on function • And get Gaussian processes! K

  15. Models as collection of priors - 3 • Take a model where xn is discrete and unknown θ • Add a prior on states (xn), assuming they are temporarily smooth • And get Hidden Markov Model! x1 xn xn+1 x2 xn-1 t1 t2 tn tn+1 tn-1

  16. Noninformative priors • Sometimes we have no strong prior belief but still want to apply Bayesian inference. Then we need noninformativepriors. • If our parameter λis a discrete variable with K states then we can simply set each prior probability to 1/K. • However for continues variables it is not so clear. • One example of a noninformative prior could be a noninformative prior over μ for Gaussian distribution: with • We can see that the effect of the prior on the posterior over μis vanished in this case.

  17. Empirical Bayes • But what if still want to assume some prior information but want to learn it from the data instead of assuming in advance? • Imagine the following model • We cannot use full Bayesian inference but we can approximate it by finding the best λ* to maximize p(X|λ) λ θs xn • N S

  18. Empirical Bayes • We can estimate the result by the following iterative procedure (EM-algorithm): • Initialize λ* • E-step: • M-step: • It illustrates the other term for Empirical Bayes – maximum marginal likelihood. • This is not fully Bayesian treatment however offers a useful compromise between Bayesian and frequentist approaches. Compute p(θ|X,λ) given fixed λ*

  19. Thank you for your attention!

More Related