1 / 137

Bayesian data analysis 1 using Bugs 2 and R 3

Bayesian data analysis 1 using Bugs 2 and R 3. 1 Fit complicated models . 2 easily . 3 display and understand results .

ursula
Download Presentation

Bayesian data analysis 1 using Bugs 2 and R 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian data analysis1 using Bugs2 and R3 1Fit complicated models . 2easily . 3display and understand results .

  2. Bayesian data analysisusing BUGS and R Prof. Andrew Gelman Dept. of Statistics and Political Science Columbia University Robert Wood Johnson Health and Society Scholars Program February 22-23, 2006

  3. Introduction to the course

  4. Goals • Should be able to… • Fit multilevel models and display and understand the results • Use Bugs from R • Theoretical • Connection between classical linear models and GLM, multilevel models, and Bayesian inference • Practical • Dealing with data and graphing in R • Reformatting models in Bugs to get them to work

  5. Realistic goals • You won’t be able to go out and fit Bayesian models • But . . . you’ll have a sense of what Bayesian data analysis “feels like”: • Model building • Model fitting • Model checking

  6. Overview • What is Bayesian data analysis? Why Bayes? • Why R and Bugs [schools example] • Hierarchical linear regression [radon example] • Bayesian data analysis: what it is and what it is not [several examples] • Model checking [several examples] • Hierarchical Poisson regression [police stops example] • Understanding the Gibbs sampler and Metropolis algorithm • Computation [social networks example] • Troubleshooting: simulations don’t work, run too slowly, make no sense • Accounting for data collection [pre-election polling example] • If there is time: hierarchical logistic regression with varying intercepts and slopes [red/blue states example] • Loose ends

  7. Structure of this course • Main narrative on powerpoint • Formulas on blackboard • On my computer and yours: • Bugs, R, and a text editor • Data, Bugs models, and R scripts for the examples • We do computer demos together • Pause to work on your own to play with the models—we then discuss together • Lecturing for motivating examples • Interrupt to ask questions

  8. Structure of this course • Several examples. For each: • Set up the Bugs model • Use R to set up the data and initial values, and to run Bugs • Use R to understand the inferences • Play with variations on your own computer • Understanding how Bugs works and how to work with Bugs • More examples • Working around problems in Bugs • Using the inferences to answer real questions • Not much theory. If you want more theory, let me know! • Emphasis on general methods for model construction, expansion, patching, and understanding

  9. Introduction to Bayes

  10. What is Bayesian data analysis? Why Bayes? • Effective and flexible • Modular (“Tinkertoy”) approach • Combines information from different sources • Estimate inferences without overfitting for models with large number of parameters • Examples from sample surveys and public health include: • Small-area estimation, longitudinal data analysis, and multilevel regression

  11. What are BUGS and R? • Bugs • Fit Bayesian statistical models • No programming required • R • Open source language for statistical computing and graphics • Bugs from R • Offers flexibility in data manipulation before the analysis and display of inferences after • Avoids tedious issues of working with Bugs directly [Open R: schools example]

  12. Open R: schools example • Open R: • setwd (“c:/hss.course/schools”) • Open WinEdt • Open file “schools/schools.R” • Open file “schools.bug” • Click on Window, Tile Horizontally • Copy the commands from schools.R and paste them into the R window

  13. 8 schools example • Motivates hierarchical (multilevel) modeling • Easy to do using Bugs • Background: educational testing experiments • Classical analyses (load data into R) • No pooling • Complete pooling • Hierarchical analysis using Bugs • Talk through the 8-schools model • Fit the model from R • Look at the inferences, compare to classical

  14. Playing with the 8 schools • Try it with different n.chains, n.iter • Different starting values • Changing the data values y • Changing the data variances sigma.y • Changing J=#schools • Changing the model

  15. End 8-schools example

  16. A brief prehistory of Bayesian data analysis • Bayes (1763) • Links statistics to probability • Laplace (1800) • Normal distribution • Many applications, including census [sampling models] • Gauss (1800) • Least squares • Applications to astronomy [measurement error models] • Keynes, von Neumann, Savage (1920’s-1950’s) • Link Bayesian statistics to decision theory • Applied statisticians (1950’s-1970’s) • Hierarchical linear models • Applications to animal breeding, education [data in groups]

  17. A brief history of Bayesian data analysis, Bugs, and R • “Empirical Bayes” (1950’s-1970’s) • Estimate prior distributions from data • Hierarchical Bayes (from 1970) • Include hyperparameters as part of the full model • Markov chain simulation (from 1940’s [physics] and 1980’s [statistics]) • Computation with general probability models • Iterative algorithms that give posterior simulations (not point estimates) • Bugs (from 1994) • Bayesian inference Using Gibbs Sampling • Can fit general models with modular structure • S (from 1980’s) • Statistical language with modeling, graphics, and programming • S-Plus (commercial version) • R (open-source)

  18. My own experiences with Bayes, Bugs, and R • Bayesian methods really work, especially for models with lots of parameters • Use R for data manipulations and simple models • Use Bugs (from R) as first try in fitting complex models • Use R to make graphs to check that fitted model makes sense • When Bugs doesn’t work, program in R directly

  19. Bayesian regression using Bugs

  20. R and Bugs for classical inference • Radon example • Fitting a linear regression in Bugs • Displaying the results in R • Complete-pooling and no-pooling models • Model extensions [open R: radon example]

  21. Open R: radon example • In WinEdt • Open file “radon/radon_setup.R” • We’ll talk through the code • In R: • setwd (“../radon”) • source (“radon_setup.R”) • Regression output will appear on the R console and graphics windows

  22. Fitting a linear regression in R and Bugs • The Bugs model • Setting up data and initial values in R • Running Bugs and checking convergence • Displaying the fitted line and residuals in R

  23. Radon example in R(complete pooling) • In WinEdt • Open file “radon/radon.classical1.R” • In other window, open file “radon.classical.bug” • Copy the commands (one paragraph at a time) from radon.classical1.R and paste them into the R window • Fit the model • Plot the data and estimates • Plot the residuals (two versions) • Label the plot

  24. Simple extensions of the radon model • Separate variances for houses with and without basements • t instead of normal errors • Fitting each model, then understanding it

  25. Radon regression with 2 variance parameters • Separate variances for houses with and without basements • Bugs model • Setting it up in R • Using the posterior inferences to compare the two variance parameters

  26. Radon example in R (regression with 2 variance parameters) • In WinEdt • Open file “radon/radon.extend.1.R” • Other window, open file “radon.classical.2vars.bug” • Copy from radon.extend.1.R and paste into the R window • Fit the model • Display a posterior inference • Compute a posterior probability • STOP

  27. Robust regression for radon data • t instead of normal errors • Issues with the degrees-of-freedom parameter • Looking at the posterior simulations and comparing to the prior distribution • Interpreting the scale parameter

  28. Radon example in R (robust regression using the t model) • In WinEdt • Stay with “radon/radon.extend.1.R” • Other window: “radon.classical.t.bug” • Copy rest of radon.extend.1.R and paste into the R window • Fit the t model • Run again with n.iter=1000 iterations • Make some plots

  29. Fit your own model to the radon data • Alter the Bugs model • Change the setup in R • Run the model and check convergence • Display the posterior inferences in R [Suggestions of alternative models?]

  30. Fitting several regressions in R and Bugs • Back to the radon example • Complete pooling [we just did] • No pooling [do now: see next slide] • Bugs model is unchanged • Fit separate model in each county and save them as a list • Displaying data and fitted lines from 11 counties • Next step: hierarchical regression

  31. Radon example in R (no pooling) • In WinEdt • Open file “radon/radon.classical2.R” • In other window, open file “radon.classical.bug” • Copy from radon.classical2.R and paste into the R window • Fit the model (looping through n.county=11 counties) • Plot the data and estimates • Plot the residuals (two versions) • Label the plot

  32. Hierarchical linear regression: models [Show the models on blackboard] • Generalizing the Bugs model • Varying intercepts • Varying intercepts, varying slopes • Adding predictors at the group level • Also write each model using algebra • More than one way to write (or program) each model

  33. Hierarchical linear regression: fitting and understanding [working on blackboard] • Displaying results in R • Comparing models • Interpreting coefficients and variance parameters • Going beyond R2

  34. Hierarchical linear regression: varying intercepts • Example: county-level variation for radon • Recall classical models • Complete pooling • No pooling (separate regressions) • Including county effects hierarchically • Bugs model • Written version • More than one way to write the model • Fitting and understanding • Interpreting the parameters • Displaying results in R

  35. Radon example in R (varying intercepts) • In WinEdt • “radon/radon.multilevel.1.R” • Other window: “radon.multilevel.1.bug” • Copy from radon.multilevel.1.R and paste into the R window • Fit the model (debug=TRUE) • Close the Bugs window • Run for 100, then 1000 iterations • Plot the data and estimates for 11 counties • Plot all 11 regression lines at once

  36. Hierarchical linear regression: varying intercepts, varying slopes • The model • Bugs model • Written version • Setup in R • Running Bugs, checking convergence • Displaying the fitted model in R • Understanding the new parameters

  37. Radon example in R (varying intercepts and slopes) • In WinEdt • “radon/radon.multilevel.2.R” • Other window: “radon.multilevel.2.bug” • Copy from radon.multilevel.2.R and paste into the R window • Fit the model (debug=TRUE) • Close the Bugs window • Run for 500 iterations • Plot the data and estimates for 11 counties • Plot all 11 regression lines at once • Plot slopes vs. intercepts

  38. Playing with hierarchical modeling for the radon example • Uranium level as a county-level predictor • Changing the sample sizes • Perturbing the data • Fitting nonlinear models

  39. Radon example in R (adding a county-level predictor) • In WinEdt • radon/radon.multilevel.3.R • Other window: radon.multilevel.3.bug • Copy from radon.multilevel.3.R and paste into the R window • Fit the model • Plot the data and estimates for 11 counties • Plot all 11 regression lines at once • Plot county parameters vs. county uranium level

  40. Varying intercepts and slopes with correlated group-level errors • The model • Statistical notation • Bugs model • Show on blackboard • Correlation and group-level predictors • More in chapters 13 and 17 of Gelman and Hill (2006)

  41. End radon example

  42. Bayesian data analysis: what it is and what it is not

  43. Bayesian data analysis: what it is and what it is not • Popular view of Bayesian statistics • Subjective probability • Elicited prior distributions • Bayesian data analysis as we do it • Hierarchical modeling • Many applications • Conceptual framework • Fit a probability model to data • Check fit, ride the model as far as it will take you • Improve/expand/extend model

  44. Overview of Bayesian data analysis • Decision analysis for home radon • Where did the “prior distribution” come from? • Quotes illustrating misconceptions of Bayesian inference • State-level opinions from national polls (hierarchical modeling and validation) • Serial dilution assay (handling uncertainty in a nonlinear model) • Simulation-based model checking • Some open problems in BDA

  45. Decision analysis for home radon • Radon gas • Causes 15,000 lung cancers per year in U.S. • Comes from underground; home exposure • Radon webpage • http://www.stat.columbia.edu/~radon/ • Click for recommended decision • Bayesian inference • Prior + data = posterior • Where did the prior distribution come from?

  46. Prior distribution for your home’s radon level • Example of Bayesian data analysis • Radon model • theta_i = log of radon level in house i in county j(i) • linear regression model: theta_i = a_j(i) + b_1*base_i + b_2*air_i + … + e_i • linear regression model for the county levels a_j, given geology and uranium level in the county, with county-level errors • Data model • y_i = log of radon measurement in house i • y_i = theta_i + Bias + error_i • Bias depends on the measurement protocol • error_i is not the same as e_i in radon model above

  47. Radon data sources • National radon survey • Accurate unbiased data—but sparse • 5000 homes in 125 counties • State radon surveys • Noisy biased data, but dense • 80,000 homes in 3000 counties • Other info • House level (basement status, etc.) • County level (geologic type, uranium level)

  48. Bayesian inference for home radon • Set up and compute model • 3000 + 19 + 50 parameters • Inference using iterative simulation (Gibbs sampler) • Inference for quantities of interest • Uncertainty dist for any particular house (use as prior dist in the webpage) • County-level estimates and national averages • Potential $7 billion savings • Model checking • Do inferences make sense? • Compare replicated to actual data, cross-validation • Dispersed model validation (“beta-testing”)

  49. Bayesian inference for home radon • Allows estimation of over 3000 parameters • Summarizes uncertainties using probability • Combines data sources • Model is testable (falsifiable)

  50. End radon example

More Related