1 / 80

Bayesian estimation Why and How to Run Your First Bayesian Model

Bayesian estimation Why and How to Run Your First Bayesian Model. Rens van de Schoot Rensvandeschoot. com. Classical null hypothesis testing. Wainer: "One Cheer for Null-Hypothesis Significance Testing“ (1999; Psych. Meth., 4 , 212-213) … however …. NHT vs. Bayes.

zander
Download Presentation

Bayesian estimation Why and How to Run Your First Bayesian Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian estimation Why and How to Run Your First Bayesian Model Rens van de Schoot Rensvandeschoot. com

  2. Classical null hypothesis testing Wainer: "One Cheer for Null-Hypothesis Significance Testing“ (1999; Psych. Meth., 4, 212-213) … however …

  3. NHT vs. Bayes Pr (Data | H0) ≠ Pr (Hi | Data)

  4. Bayes Theorem • Pr (Hi | Data) = • Posterior ≈ prior * data • Posterior probability is proportional to the product of the prior probability and the likelihood

  5. Bayes theorem: prior, data and posterior Bayes Theorem:

  6. Bayes Theorem • Pr (Hi| Data) = • Posterior ≈ prior * data • Posterior probability is proportional to the product of the prior probability and the likelihood

  7. Intelligence (IQ) -∞ ∞ IQ

  8. Prior Knowledgde 1 -∞ ∞ IQ

  9. Prior Knowledgde IQ 40 180

  10. Prior Knowledgde 2 IQ 40 180

  11. Prior Knowledgde 3 100 40 180 IQ

  12. Prior Knowledgde 4 100 40 180 IQ

  13. Prior Knowledgde 5 100 40 180 IQ

  14. Prior Knowledgde 1 -∞ ∞ 2 3 5 4

  15. Prior Prior -∞ ∞ IQ

  16. Data Data Prior -∞ ∞ IQ

  17. Posterior Posterior Data Prior -∞ ∞ IQ

  18. Prior - Data Data Prior 100 40 180 IQ

  19. Prior - Data Data Prior 100 40 180 IQ

  20. How to obtain posterior? • In complex models, the posterior is often intractable (impossible to compute exactly) • Solution: approximate posterior by simulation • Simulate many draws from posterior distribution • Compute mode, median, mean, 95% interval et cetera from the simulated draws

  21. ANOVAexample 4 unknown parameters μj (j=1,...,4) and one common but unknown σ2. Statistical model: Y = μ1*D1 + μ2*D2 + μ3*D3 + μ4*D4 + E with E ~ N(0, σ2)

  22. The Gibbs sampler Specify prior: Pr(μ1, μ2, μ3, μ4, σ2) Prior (μj) ~ Nor(μ0, var0) Prior (μj) ~ Nor(0,10000) Prior (σ2) ~ IG(0.001, 0.001)

  23. Prior is Inverse Gamma a (shape), b (scale)

  24. The Gibbs sampler Combine prior with likelihood provides posterior: Post ( μ1, μ2, μ3, μ4, σ2 | data ) …this is a 5 dimensional distribution…

  25. The Gibbs sampler • Iterative evaluation via conditional distributions: • Post ( μ1 | μ2, μ3, μ4, σ2, data ) ~ Prior (μ1) X Data (μ1) • Post ( μ2 | μ1, μ3, μ4, σ2, data ) ~ Prior (μ2) X Data (μ2) • Post ( μ3 | μ1, μ2, μ4, σ2, data ) ~ Prior (μ3) X Data (μ3) • Post ( μ4 | μ1, μ2, μ3, σ2, data ) ~ Prior (μ4) X Data (μ4) • Post ( σ2 | μ1, μ2, μ3, μ4, data ) ~ Prior (σ2) X Data (σ2)

  26. The Gibbs sampler • Assign starting values • Sample μ1 from conditional distribution • Sample μ2 from conditional distribution • Sample μ3 from conditional distribution • Sample μ4 from conditional distribution • Sample σ2 from conditional distribution • Go to step 2 until enough iterations

  27. The Gibbs sampler

  28. Trace plot

  29. Trace plot: posterior

  30. Posterior Distribution

  31. Burn In • Gibbs sampler must run t iterations ‘burn in’ before we reach target distribution f(Z) • How many iterations are needed to converge on the target distribution? • Diagnostics • Examine graph of burn in • Try different starting values • Run several chains in parallel

  32. Convergence

  33. Convergence

  34. Convergence

  35. Convergence

  36. Convergence

  37. Conclusion about convergenge • Burn-in: Mplus deletes first half of chain • Run multiple chains (Mplus default 2) • Decrease Bconvergence: default .05 but better use .01 • ALWAYS do a graphical evaluation of each and every parameter

  38. Summing up • Degree of belief • What is known before observing the data • What is known after observing the • Tool to include subjective knowledge • Try to express absence of prior knowledge • Posterior mainly determined by data • Simulation (sampling) techniques to obtain the posterior distribution and all posterior summary measures • Important to check • Probability • Prior • Posterior • Informative prior • Non-informative prior • MCMC methods • Convergence

  39. IQ • N = 20 • Data are generated • Mean = 102 • SD = 15 • N = 20 • Data are generated • Mean = 102 • SD = 15 IQ

  40. IQ

  41. IQ

  42. Technical Intermezzo ...C.C.I.??? 43 43

  43. Uncertainty in Classical Statistics • Uncertainty = sampling distribution • Estimate population parameter  by • Imagine drawing an infinity of samples • Distribution of over samples • Problem is that we have only one sample • Estimate and its sampling distribution • Estimate 95% confidence interval

  44. Inference in Classical Statistics • What does 95% confidence interval actually mean? • Over an infinity of samples, 95% of these contain the true population value  • But we have only one sample • We never know if our present estimate and confidence interval is one of those 95% or not

  45. Inference in Classical Statistics • What does 95% confidence interval NOT mean? • We have a 95% probability that the true population value  is within the limits of our confidence interval • We only have an aggregate assurance that in the long run 95% of our confidence intervals contain the true population value

  46. Uncertainty in Bayesian Statistics • Uncertainty = probability distribution for the population parameter • In classical statistics the population parameter  has one single true value • In Bayesian statistics we imagine a distribution of possible values of population parameter 

  47. Inference in Bayesian Statistics • What does a95% central credibility interval mean? • We have a 95% probability that the population value  is within the limits of our confidence interval

  48. What have we learned so far? • Results are compromise of prior & data • However: • -> non/low-informative priors • -> informative priors • -> misspecification of the prior • -> convergence • Results are easier to communicate • (eg CCI compared to confidence interval)

  49. Software • WinBUGS/ OpenBUGS • Bayesian inference Using Gibbs Sampling • Very general, user must set up model • R packages • LearnBayes, R2Winbugs, MCMCpack • MLwiN • Special implementation for multilevel regression • AMOS • Special implementation for SEM • Mplus • Very general (SEM + ML + many other models)

More Related