420 likes | 721 Views
Bayesian modelling hevruta. How does it work? JAGS, MCMC, and more…. Our goal. Prior. Likelihood. Posterior. Examples of statistical analysis parameters: – for simple normally distributed data – GLM coefficients - for bivariate distributions
E N D
Bayesian modelling hevruta How does it work? JAGS, MCMC, and more…
Our goal Prior Likelihood Posterior • Examples of statistical analysis parameters: • – for simple normally distributed data • – GLM coefficients • - for bivariate distributions • (covariance matrix) – for multivariate distributions
Data analysis example We want to say something about the height of a population. We have the following sample:
Data analysis example The first question is - what is the data generating process?. We assume a normal process so that: (data variance is assumed to be known for simplicity)
Data analysis example The second question is what kind of prior do we want to assume
Data analysis example For a weakly informative prior we can choose a prior that representswhat we know about heights in general, but allows for high variation. E.g.
Posterior predictive check Posterior Simulate new data
Posterior predictive check Posterior Simulate new data
Original sample Posterior predictive
Binomial likelihood – coin toss P-value (one-tailed=0.028; two tailed=0.057)
Posterior Posterior beliefs under after experiment 1 Posterior beliefs under after both experiments
Using experiment 1 posterior as experiment 2 prior Combining the results of both experiments (y=49+40, N=200)
MCMC – why do we need this • For discrete parameters • We can pretty easily sum across all possible values of • At least when there is a single parameters to estimate…. • For continuous parameters • We can’t easily integrate across , especially when there is more than one parameter to estimate Normalized posterior Non-normalized posterior
MCMC – how does it work • A group of algorithms that can use our knowledge about the relative posterior probability (non-normalized) of each pair of values, to build a representative sample from the posterior distribution • Kruschke’s island hopping politician metafor: • Wants to visit each island proportionally to its population • Knows only the population of the current island and can find out the population of the two adjacent islands • The Metropolis algorithm (simplification): • Toss a coin to decide whether to check the population of the island to the right or to the left • If the population of the proposed island is bigger than the population of the current island – go there • Otherwise, go to the proposed island with a probability of
A more realistic example • Start at a initial value of for which • Calculate the relative probability of by: • Draw the proposed value from a normal distribution centered around the current value (sd of the proposal distribution will affect the smoothness of the random walk) • Use the same decision rule:
Other (more efficient) samples • Gibbs samples – used by JAGS and WINBUGS. • Great for models with a lot of parameters, especially when parameters are inter-dependent (e.g. Hierarchical models) • Hamiltonian MCMC sample – used by STAN • We probably won’t talk about it unless you want us to. It’s a faster but a more complicated software, that is much better than JAGS for more complex model (e.g. time-series models, multivariate models, models with complex covariance structures)
MCMC diagnostics - representativeness • Trace plot
MCMC diagnostics - representativeness • Actual posteriors
MCMC diagnostics - - representativeness • Gelman-Rubin statistic • Both should be at around 1, and not higher than 1.1. This can be used to decide on burn-in.
MCMC diagnostics - Autocorrelation • Why there is an autocorrelation? • Why is it a problem • Effective sample size – the size of the sample of relatively independent iterations (10,000 is a good heuristic number…. (Kruschke))
Diagnostics – the beta-binomial example Larger than 9000 because of negative correlations. Ignore it.