120 likes | 131 Views
This lecture introduces the concept of Bayesian statistics and explores its applications and implications. It covers the basic principle of Bayesian estimation, advantages and drawbacks of Bayesian methods, and the use of conjugate priors.
E N D
Stat 391 – Lecture 12 Bayesian Statistics on a Shoestring Assaf Oron, May 2008
Bayes’ Rule – and “Bayesians” • Bayes lived and proved his rule a long time ago • The rule, and the updating principle associated with it, belong to all branches of statistics • The term “Bayesian statistics” is modern. Depending upon whom you ask, it may represent: • A perspective and toolset, which are useful for many tasks; • The only way to do statistics intelligently; • …An irrational cult! • (it’s somewhat of a generational gap right now) • … I will try to present Bayesian statistics via the description marked in blue above
The Basic Principle • Recall the trick we did a few weeks ago, calling the density “likelihood” and viewing it as a function of the fixed parameters • Recall also more recently, the awkward jargon used to describe confidence intervals • These somewhat inelegant fixes can be traced down to an asymmetry: • The data are modeled as following some probability distribution • The parameters are modeled as fixed, though usually unknown • What if we decided that the parameters are random, too?...
The Basic Principle (2) • Let’s view the data as an r.v. called X • Parameters are, of course, θ; • Write down Bayes’ rule, using densities: This is the ‘regular’ (“frequentist”) likelihood of the data given fixed parameter values This is the ‘prior’ density of the parameters (based on previous knowledge, usually unrelated to the current data) This marginal probability of the data over all possible parameter configurations, is not a function of θ and is irrelevant for estimation
The Basic Principle (3) • …the Bayesian way of writing Bayes’ rule is usually this: The posterior distribution of the parameters, based on the data The prior distribution of the parameters, before the data (Since we omitted the marginal probability of the data, the equation becomes a proportionality; we don’t care, since we know the LHS is a density we can “find” the missing factor automatically by normalizing the integral of the LHS to 1)
Bayesian Estimation • Bayesian estimation is based primarily on probability calculations from the posterior, • The most common Bayesian point estimates are the posterior mean (i.e., E[θ|x]), median or mode • These can be framed as solutions to different loss-minimization problems
A Brief History of Bayesianism • The Bayesian idea has been around for while, but sat mostly on the shelf for practical reasons: • If you take any two arbitrary distributions for data and prior, you will end up with an intractably complicated posterior • (for each “common” data distribution, there exists at least one type of prior that fits it well; it is known as the “conjugate prior”) • With the advent of computing, a statistical-simulation technology known as MCMC (“Markov Chain Monte Carlo”) has made (nearly) any combination of distributions possible to compute, sometimes instantly
Conjugate Prior Hands-on • The conjugate prior for the Binomial is the Beta • That is: X ~ Binomial(n,p) and p ~ Beta(α,β) should match nicely • Write out the kernel of the posterior (i.e., the essential form – only terms with x or p in them): • Simplify this a bit further; can you recognize the form of the posterior?
Advantages of Bayesian Methods • A symmetry that is conceptually attractive • Can incorporate prior content information (from scientists, etc.) that should play a role in evaluation of the data • Hypothesis tests, model selection, confidence intervals become easier • Risk of wrong model (=“model misspecification”) can be reduced • More complete information about parameters
Advantages of Bayesian Methods (2) • Avoids some of the counter-intuitive side-effects of MLE calculations • Ability to fit complicated models, estimate complicated parameters, accommodate for errors in “fixed” values • In many cases, a random interpretation fits the parameters more than a fixed one: • Opinion polls and human behavior • Ecology, Demographics (coming to think about it, natural populations are never really fixed)
Drawbacks of Bayesian Methods • Symmetry? Not really • “It’s Tortoises all the way down”:the prior needs parameters too… and they better be fixed, or else; which is exactly the problem • The prior affects our estimation, whether or not it is really based on expert knowledge • A workaround known as “flat” or “improper” priors, has made things worse in many ways: if you use them, you may find yourself not having a posterior distribution at all
Drawbacks of Bayesian Methods (2) • Choice of prior form and details – adds yet another arbitrary element to the tenuous connection between model and reality • MCMC simulations have a lot of “moving parts” and are not trivial to diagnose for problems • Socially, the approach has “hype”, and dogmatic “group-think” overtones that are not helpful • In many cases, a random interpretation is not appropriate