1 / 116

Applied Bayesian Methods

Applied Bayesian Methods. Phil Woodward. Introduction to Bayesian Statistics. Inferences via Sampling Theory. Inferences made via sampling distribution of statistics A model with unknown parameters is assumed Statistics (functions of the data) are defined

azalia-kemp
Download Presentation

Applied Bayesian Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applied Bayesian Methods Phil Woodward Phil Woodward 2014

  2. Introduction to Bayesian Statistics Phil Woodward 2014

  3. Inferences via Sampling Theory • Inferences made via sampling distribution of statistics • A model with unknown parameters is assumed • Statistics (functions of the data) are defined • These statistics are in some way informative about the parameters • For example, they may be unbiased, minimum variance estimators • Probability is the frequency with which recurring events occur • The recurring event is the statistic for fixed parameter values • The probabilities arise by considering data other than actually seen • Need to decide on most appropriate “reference set” • Confidence and p-values are p(data “or more extreme”| θ) calculations • Difficulties when making inferences • Nuisance parameters an issue when no suitable sufficient statistics • Constraints in the parameter space cause difficulties • Confidence intervals and p-values are routinely misinterpreted • They are not p(θ | data) calculations Phil Woodward 2014

  4. How does Bayes add value? • Informative Prior • Natural approach for incorporating information already available • Smaller, cheaper, quicker and more ethical studies • More precise estimates and more reliable decisions • Sometimes weakly informative priors can overcome model fitting failure • Probability as a “degree of belief” • Quantifies our uncertainty in any unknown quantity or event • Answers questions of direct scientific interest • P(state of world | data) rather than P(data* | state of world) • Model building and making inferences • Nuisance parameters no longer a “nuisance” • Random effects, non-linear terms, complex models all handled better • Functions of parameters estimated with ease • Predictions and decision analysis follow naturally • Transparency in assumptions • Beauty in its simplicity! • p(θ | x) = p(x | θ) p(θ) / p(x) • Avoids issue of identifying “best” estimators and their sampling properties • More time spent addressing issues of direct scientific relevance Phil Woodward 2014

  5. Probability • Most Bayesians treat probability as a measure of belief • Some believe probabilities can be objective (not discussed here) • Probability not restricted to recurring events • E.g. probability it will rain tomorrow is a Bayesian probability • Probabilities lie between 0 (impossible event) and 1 (certain event) • Probabilities between 0 and 1 can be calibrated via the “fair bet” • What is a “fair bet”? • Bookmaker sells a bet by stating the odds for or against an event • Odds are set to encourage a punter to buy the bet • E.g. odds of 2-to-1 against means that for each unit staked two are won, plus the stake • A fair bet is when one is indifferent to being bookmaker or punter • i.e. one doesn’t believe either side has an unfair advantage in the gamble Phil Woodward 2014

  6. Probability • Relationship between odds and probability • One-to-one mapping between odds (O) and probability (P) Where O equals the ratio X/Y for odds of X-to-Y in favour and the ratio Y/X for odds of X-to-Y against an event e.g. odds of 2-to-1 against, if fair, imply probability equals ⅓ • Probabilities defined this way are inevitably subjective • People with different knowledge may have different probabilities • Controversy occurs when using this definition to interpret data • Science should be “objective”, so “subjectivity” to some is heresy • But where do the models that Frequentists use come from? • Are the decisions made when designing studies purely objective? • Is judgment needed when generalising from a sample to a population? Phil Woodward 2014

  7. Probability • Subjectivity does not mean biased, prejudiced or unscientific • Large body of research into elicitation of personal probabilities • Where frequency interpretation applies, these should support beliefs • E.g. the probability of the next roll of a die coming up a six should be ⅙ for everyone unless you have good reason to doubt the die is fair • An advantage of the Bayesian definition is that it allows all other information to be taken into account • E.g. you may suspect the person offering a bet on the die roll is of dubious character • Bayesians are better equipped to win at poker than Frequentists! • All unknown quantities, including parameters, are considered random variables • each parameter still has only one true value • our uncertainty in this value is represented by a probability distribution Epistemic uncertainty Phil Woodward 2014

  8. Exchangeability • Exchangeability is an important Bayesian concept • exchangeable quantities cannot be partitioned into more similar sub-groups • nor can they be ordered in a way that infers we can distinguish between them • exchangeability often used to justify prior distribution for parameters analogous to classical random effects Phil Woodward 2014

  9. The Bayesian Paradigm A From and comes Bayes Theorem Nothing controversial yet. B Phil Woodward 2014

  10. The Bayesian Paradigm How is Bayes Theorem (mis)used? Coin tossing study: Is the coin fair? Model ri ~ bern(π) i = 1, 2, ..., n ri = 1 if ith toss a head, = 0 if a tail Let terms in Bayes Theorem be A = π (controversial) B = r then Why? Phil Woodward 2014

  11. The Bayesian Paradigm What are these terms? p(r|π) is the likelihood = bin(n, Σr| π) (not controversial) p(π) is the prior = ??? (controversial) The prior formally represents our knowledge of π before observing r Phil Woodward 2014

  12. The Bayesian Paradigm What are these terms (continued)? p(r) is the normalising constant = ∫ p(r|π) p(π) dπ (the difficult bit!) p(π|r) is the posterior The posterior formally represents our knowledge of π after observing r MCMC to the rescue! In general, not in this particular case Phil Woodward 2014

  13. The Bayesian Paradigm A worked example. Coin tossed 5 times giving 4 heads and 1 tail p(r|π) = bin(n=5, Σr=4| π) p(π) = beta(a, b), when a=b=1 ≡ U(0, 1) Why choose a beta distribution?! - conjugacy … posterior p(π|r) = beta(a+Σr, b+n-Σr) - can represent vague belief? - can be an objective reference? - Beta family is flexible (could be informative) What if data were 5 dogs in tox study: 4 OK, 1 with an AE? ...but is a stronger prior justifiable? Phil Woodward 2014

  14. The Bayesian Paradigm A worked example (continued). Applying Bayes theorem p(π|r) = beta(5, 2) 95% credible intervalπ : (0.36 to 0.96) Pr[πϵ (0.36 to 0.96) | Σr = 4] = 0.95 95% confidence intervalπ : (0.28 to 0.995) Pr[Σr ≥ 4 | π = 0.28] = 0.025,Pr[Σr ≤ 4 | π = 0.995] = 0.025 Phil Woodward 2014

  15. The Bayesian Paradigm Bayesian inference for simple Normal model Clinical study: What’s the mean response to placebo? Model yi ~ N(µ, σ2) i = 1, 2, ..., n (placebo subjects only) assume σ known and for convenience will useprecision parameter τ = σ-2 (reciprocal of variance) Terms in Bayes Theorem are Phil Woodward 2014

  16. The Bayesian Paradigm Improper prior density Phil Woodward 2014

  17. The Bayesian Paradigm Posterior precision equals sum of prior and data precisions Posterior mean equals weighted mean of prior and data Phil Woodward 2014

  18. The Bayesian Paradigm Phil Woodward 2014

  19. The Bayesian Paradigm A worked example (continued). Applying Bayes theorem p(µ |y) = N(80,0.5) 95% credible interval µ : (78.6 to 81.4) 95% confidence intervalµ : (78.6 to 81.4) Phil Woodward 2014

  20. The Bayesian Paradigm Bayesian inference for simple Normal model The case when both mean and variance are unknown Model yi ~ N(µ, σ2) i = 1, 2, ..., n Terms in Bayes Theorem are Phil Woodward 2014

  21. The Bayesian Paradigm Phil Woodward 2014

  22. The Bayesian Paradigm Phil Woodward 2014

  23. The Bayesian Paradigm Bayesian inference for Normal Linear Model Model y = Xθ + εεi ~ N(0, σ2) i = 1, 2, ..., n y and ε are n x 1 vectors of observations and errors X is a n x k matrix of known constants θ is a k x 1 vector of unknown regression coefficients Terms in Bayes Theorem are Phil Woodward 2014

  24. The Bayesian Paradigm Phil Woodward 2014

  25. The Bayesian Paradigm In summary, for Normal Linear Model (“fixed effects”) Classical confidence intervals can be interpretedas Bayesian credible intervals But, need to be aware of implicit prior distributions Not generally the case for other error distributions But for “large samples” when likelihood based estimatorhas approximate Normal distribution, a Bayesian interpretation can again be made “Random effects” models are not so easily compared Don’t assume classical results have Bayesian interpretation Phil Woodward 2014

  26. The Bayesian Paradigm Posterior distribution for µ Conditional (on µ) distribution for future response Phil Woodward 2014

  27. The Bayesian Paradigm N(µ1, 1/τ1) N(µ, σ2) yf ~ N(µ1, 1/τ1+ 1/τ) Sum of posterior variance of µ and conditional variance of yf Phil Woodward 2014

  28. The Bayesian Paradigm Predictive Distributions When are predictive distributions useful? When designing studieswe predict the data using priors to assess the design we may use informative priors to reduce study size, these being predictions from historical studies When undertaking interim analyseswe can predict the remaining data using current posterior When checking adequacy of our assumed model model checking involves comparing observations with predictions When making decisions after study has completedwe can predict future trial data to assess probability of success, helping to determine best strategy or decide to stop Some argue predictive inferences should be our main focusbe interested in observable rather than unobservable quantities e.g. how many patients will do better on this drug? “design priors” must be informative Phil Woodward 2014

  29. The Bayesian Paradigm δ is treatment effect Phil Woodward 2014

  30. The Bayesian Paradigm Phil Woodward 2014

  31. The Bayesian Paradigm Phil Woodward 2014

  32. The Bayesian Paradigm • Making Decisions • A simple Bayesian approach defines criteria of the form • Pr(δ≥ Δ) > π • where Δ is an effect size of interest, and π is the probability required to make a positive decision • For example, Bayesian analogy to significance could be • Pr(δ> 0) > 0.95 • But is believing δ > 0 enough for further investment? Phil Woodward 2014

  33. END OF PART 1intro to WinBUGSillustrating fixed effect models Phil Woodward 2014

  34. Bayesian Model Checking Phil Woodward 2014

  35. Bayesian Model Checking Brief outline of some methods easy to use with MCMC Consider three model checking objectives • Examination of individual observations • Global tests of goodness-of-fit • Comparison between competing models In all cases we compare observed statistics with expectations, i.e. predictions conditional on a model Phil Woodward 2014

  36. Bayesian Model Checking yi is the observation Yi is the prediction E(Yi) is the mean of the predictive distribution Bayesian residuals can be examined as we do classical residuals p-value concept Phil Woodward 2014

  37. Bayesian Model Checking Ideally we would have a separate evaluation dataset Predictive distribution for Yi is then independent of yi Typically not available for clinical studies Cross-validation next best, but difficult within WinBUGS Following methods use the data twice, so will be conservative, i.e. overstate how good model fits data Will illustrate using WinBUGS code for simplest NLM Phil Woodward 2014

  38. Bayesian Model Checking (Examination of Individual Observations) { ### Priors   mu ~ dnorm(0, 1.0E-6) prec ~ dgamma(0.001, 0.001) ; sigma <- pow(prec, -0.5) ### Likelihood for (i in 1:N) {     Y[i] ~ dnorm(mu, prec) } ### Model checking for (i in 1:N) { ### Residuals and Standardised Residuals      resid[i] <- Y[i] – mu st.resid[i] <- resid[i] / sigma ### Replicate data set & Prob observation is extreme Y.rep[i] ~ dnorm(mu, prec) Pr.big[i] <- step( Y[i] – Y.rep[i] ) Pr.small[i] <- step( Y.rep[i] – Y[i] ) } } More typically, each Y[i] has different mean, mu[i]. each residual has a distribution use the mean as the residual Y.rep[i] is a prediction accounting for uncertainty in parameter values, but not in the type of model assumed mean of Pr.big[i] estimates the probability a future observation is this big only need both when Y.rep[i] could exactly equal Y[i] Phil Woodward 2014

  39. Bayesian Model Checking (Global tests of goodness-of-fit) Identify a discrepancy measure typically a function of the data but could be function of both data and parameters Predict (replicate) values of this measure conditional on the type of model assumed but accounting for uncertainty in parameter values Compute “Bayesian p-value” for observed discrepancy similar approach used for individual observations convention for global tests is to quote “p-value” e.g. a measure of skewness for testing this aspect of Normal assumption Phil Woodward 2014

  40. Bayesian Model Checking (Global tests of goodness-of-fit) { … code as before … ### Model checking for (i in 1:N) { ### Residuals and Standardised Residuals      resid[i] <- Y[i] – mu st.resid[i] <- resid[i] / sigma m3[i] <- pow( st.resid[i], 3) ### Replicate data set Y.rep[i] ~ dnorm(mu, prec) resid.rep[i] <- Y.rep[i] – mu[i] st.resid.rep[i] <- resid.rep[i] / sigma m3.rep[i] <- pow( st.resid.rep[i], 3) } skew <- mean( m3[] ) skew.rep <- mean( m3.rep[] ) p.skew.pos <- step( skew.rep – skew ) p.skew.neg <- step( skew – skew.rep ) } p.skew interpreted as for classical p-value, i.e. small is evidence of a discrepancy Phil Woodward 2014

  41. Bayesian Model Checking (Comparison between competing models) not easy to implement using MCMC will not be discussed further Bayes factors ratio of marginal likelihoods under competing models Bayesian analogy to classical likelihood ratio test Phil Woodward 2014

  42. Bayesian Model Checking (Comparison between competing models) Deviance Information Criterion (DIC) a Bayesian “information criterion” but not the BIC will not discuss theory, focus on practical interpretation WinBUGS & SAS can report this for most models DIC is the sum of two separately interpretable quantities DIC = Dbar + pD Dbar : the posterior mean of the deviance pD : the effective number of parameters in the model pD = Dbar - Dhat Dhat : deviance point estimate using posterior mean of θ Phil Woodward 2014

  43. Bayesian Model Checking (Comparison between competing models) Deviance Information Criterion (DIC) DIC = Dbar + pD pD will differ from the total number of parameters when posterior distributions are correlated typically the case for “random effect parameters” non-orthogonal designs, correlated covariates common for non-linear models pD will be smaller because some parameters’ effects “overlap” Phil Woodward 2014

  44. Bayesian Model Checking (Comparison between competing models) Deviance Information Criterion (DIC) DIC = Dbar + pD Measures model’s ability to make short-term predictions Smaller values of DIC indicate a better model Rules of thumb for comparing models fitted to the same data DIC difference > 10 is clear evidence of being better DIC difference > 5 (< 10) is still strong evidence There are still some unresolved issues with DIC relatively early days in it use, so use other methods as well Phil Woodward 2014

  45. Bayesian Model Checking (practical advice) “All models are wrong, but some are useful” if we keep looking, or have lots of data, we will find lack-of-fit need to assess whether model’s deficiencies matter depends upon the inferences and decisions of interest judge the model on whether it is fit for purpose Sensitivity analyses are useful when uncertain should assess sensitivity to both the likelihood and the prior Model expansion may be necessary Bayesian approach particularly good here Informative priors and MCMC allow greater flexibility e.g. replace Normal with t distribution Phil Woodward 2014

  46. Introduction to BugsXLA Parallel Group Clinical Study (Analysis of Covariance) Phil Woodward 2014

  47. BugsXLA (case study 3.1) Switch to Excel and demonstrate how BugsXLA facilitates rapid Bayesian model specification and analysis via WinBUGS. Phil Woodward 2014

  48. BugsXLA (case study 3.1) Phil Woodward 2014

  49. BugsXLA (case study 3.1) Settings used by WinBUGS Posterior distributions to be summarised Posterior samples to be imported Save WinBUGS files, create R scripts Suggested settings Phil Woodward 2014

  50. BugsXLA (case study 3.1) Fixed factor effects parameterised as contrasts from a zero constrained level Priors for other parameter types Default priors chosen to be “vague” (no guarantees!) Bayesian model checking options Phil Woodward 2014

More Related