Bayesian Statistics

Bayesian Statistics Simon French simon.french@warwick.ac.uk

The usual view of statistics What does the data – and only the data –tell us in relation to the research questions of interest? By focusing on the data alone, we are ‘clearly’ being objective….

But … … classical/frequentisttatistical methods contain hidden subjective choices …. • Why choose 1% or 5% as significance levels? • Why choose a minimum variance unbiased estimate rather than a maximum likelihood estimator which might be biased but lead to tighter bounds? • ….

The Bayesian paradigm … … is explicitly subjective. • It models judgements and explores their implications • probabilities to represent beliefs and uncertainties • (and utilities to represent values and costs so that inferences lead transparently to decisions) • is based upon a model of an idealised (consistent, rational) scientist • focuses firston the individual scientist; thenby varying the scientist’s beliefs enables the exploration of potential consensus. For a Bayesian, knowledge is based on consensus

The Bayesian view of statistics What are we uncertain about and how does the data reduce that uncertainty? not What does the data – and only the data –tell us in relation to the research questions of interest?

Rev. Thomas Bayes • 1701?-1761 • Main work published posthumously:T. Bayes (1763) An essay towards solving a problem in the doctrine of chances. Phil Trans Roy. Soc. 53 370-418 • Bayes Theorem – inverse probability

Bayes theorem Posterior probability  likelihood  prior probability p(| x)  p(x | ) × p()

Bayes theorem Posterior probability  likelihood  prior probability p(| x)  p(x | ) × p() Our knowledge before the experimentProbability distribution of parameters p()

Bayes theorem Posterior probability  likelihood  prior probability p(| x)  p(x | ) × p() Our knowledge of the design of the experimentor survey and the actual datalikelihood of data given parameters p(x|)

Bayes theorem Posterior probability  likelihood  prior probability p(| x)  p(x | ) × p() Our knowledge after the experimentProbability distribution of parameters given data p(|x)

Bayes theorem Posterior probability  likelihood  prior probability p(| x)  p(x | ) × p() There is a constant,but ‘easy’ to find as probabilityadds (integrates) to one

Medical Test Disease • Probability of having disease = 0.001 • i.e. 1 in 1000 • Probability of not having disease = 0.999 • Test has 95% of detecting disease if present; but 2% of falsely detecting it if absent • False negative rate = 5%False positive rate = 2% Test GeNIe Software:http://genie.sis.pitt.edu

Simple Bayes Normal Model:

Bayes Theorem as applied to Statistics Toss a biased coin 12 times; obtain 9 heads Prior Posterior

Bayesian Estimation Toss a biased coin 12 times; obtain 9 heads Prior Take mean, median or mode Posterior

Bayesian confidence interval Toss a biased coin 12 times; obtain 9 heads Prior Highest 95% density Posterior

Bayesian hypothesis test Toss a biased coin 12 times; obtain 9 heads Prior To test H0: 1 > 0.6look at Prob(1>0.6) Posterior

But why do any of these? Just report the posterior. It encodes all that is known about 1

Decision? Science Model uncertainties with probabilities Values Model preferences with multi-attribute utilities Data Observe data X = x from pX(· | ) feedbackto futuredecisions Bayes Theorem Combine  Advice Bayesian decision analysis Statistics Decision and Risk Analysis

Bayes Calculations • Analytic approaches • conjugate families of distributions • Kalman filters • Numerical integration • Quadrature • Asymptotic expansions • Markov Chain Monte Carlo (MCMC) • Gibbs Sampling, Particle filters • Almost any distributions and models

Modelling uncertainty • Might be better to say Bayesians practice uncertainty modelling • There are simple modelling strategies and tools for this • hierarchical modelling • belief nets • ….

Bayes theorem p(| x) p(x | ) × p() = p(x, ) • In real problems, x and  are multi-dimensional • with ‘big data’, very high dimensional • Can we restructure p(x, ) to be easier to work with? • e.g. to draw in and use independence structures, etc.

Hierarchical Models Simple Bayes Normal Model: Three Stage Bayes Normal Model:  X1 1 2 X2 n Xn …. ….

The Asia Belief Net Visit to Asia? Smoking? Tuberculosis Lung Cancer Bronchitis X-Ray Result? Dyspnea?

Subjectivity vs Objectivity • Bayesian statistics is explicitly subjective • Science is (thought to be)objective • controversy!

Importance of prior • Different priors lead to different conclusions • subjective  not scientific? • Can use: • ignorant (vague, non-informative) prior to ‘let data speak for themselves’ • precise prior to capture agreed common knowledge • Sensitivity analysis to explore the importance of the priors • Indeed can use sensitivity analysis to explore agreements and disagreements on many aspects of the model not just the prior • If Science is about a consensus on knowledge, then exploring a range of priors helps establish precisely that

All analysis assumes a model … • Another subjective choice and one not often address in any discussion of methodology • same is true in classical/frequentist statistics • Bayesian analysis provides an assessment of uncertainties in the context of the assumed model • same is true in classical/frequentiststatistics: • e.g. p values • Real world uncertainty includes these but more that arise from the fact the model is not the real world

BUGS Software • Bayesian inference Using Gibbs Sampling • Lunn, D.J., Thomas, A., Best, N., and Spiegelhalter, D. (2000) WinBUGS -- a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing, 10:325−337 • Lunn, D. J., Jackson, C., Best, N., Thomas, A. and Spiegelhalter, D. (2013). The BUGS Book: a Practical Introduction to Bayesian Analysis. London, Chapman and Hall. • http://www.mrc-bsu.cam.ac.uk/bugs/

Reading W.M. Bolstad (2007). Introduction to Bayesian Statistics. 2ndEdn, Hoboken, NJ, John Wiley and Sons. P. M. Lee (2012). Bayesian Statistics: An Introduction. 4thEdn, Chichester, John Wiley and Sons. R. Christensen, W. Johnson, A. Branscum and T.E. Hanson (2011) Bayesian Ideas and Data Analysis. Boca Raton, CRC/Chapman and Hall P. Congdon (2001) Bayesian Statistical Modelling. Chichester, John Wiley and Sons S. French and D. Rios Insua (2000). Statistical Decision Theory. London, Arnold. A. O'Hagan and J. Forester (2004). Bayesian Statistics. London, Edward Arnold. J.M. Bernardo and A.F.M. Smith (1994). Bayesian Theory. Chichester, John Wiley and Sons.

ISBA • International Society for Bayesian Analysis • www.bayesian.org • Many resources and guide to software, literature, etc. • Newsletter • Open journal: Bayesian Analysis

Thank you

Bayesian Statistics