520 likes | 754 Views
Bayesian Parametric and Semi-Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion:. Rich MacLehose November 9 th , 2006. Outline. Brief introduction to hierarchical models Introduce 2 ‘standard’ parametric models
E N D
Bayesian Parametric and Semi-Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion: Rich MacLehose November 9th, 2006
Outline • Brief introduction to hierarchical models • Introduce 2 ‘standard’ parametric models • Extend them with 2 semi-parametric models • Applied example of Disinfection by-products and spontaneous abortion
Bayesian Hierarchical Models • Natural way to model many applied problems • μi are assumed exchangeable • G may depend on further coefficients • that extend the hierarchy
Bayesian Hierarchical Models • Frequently, researchers will be interested in estimating a large number of coefficients, possibly with a large amount of correlation between predictors. • Hierarchical models may be particularly useful by allowing ‘borrowing’ of information between groups.
Bayesian Hierarchical Models • For example, we wish to estimate the effect of 13 chemicals on early pregnancy loss. • The chemicals are highly correlated, making standard frequentist methods unstable.
Hierarchical Regression Models • Traditional models treat the outcome as random • Hierarchical models also treat coefficients as random
Some Parametric and Semi-Parametric Bayesian Hierarchical Models • Simplest hierarchical model (P1) • Fully Bayesian hierarchical model (P2) • Dirichlet process prior • Dirichlet process prior with selection component
1: The first parametric model (P1) • A simple “one-level” hierarchical model • Popularized by Greenland in Epidemiology • he refers to it as “semi-Bayes” • may refer to asymptotic methods commonly used in fitting semi-Bayes models • Has seen use in nutritional, genetic, occupational, and cancer research
Hierarchical Models: Bayes and Shrinkage • μ is our prior belief about the size of the effect and ϕ2 is uncertainty in it • Effect estimates from hierarchical models are ‘shrunk’ (moved) toward the prior distribution • Shrinkage: • for Bayesian: natural consequence of combining prior with data • for frequentist: introduce bias to reduce MSE (biased but more precise) • Amount of shrinkage depends on prior variance
Hierarchical Models:Bayes and Shrinkage In the simple Normal-Normal setting, yi may be either a continuous response or an imputed latent response (via Albert and Chib) So the posterior is: Where: And I is the pxp identity matrix
Model P1: Shrinkage SB model: μ=0 and ϕ2 =2.0, 1.0, and 0.5
The problem with model P1 • Assumes the prior variance is known with certainty • constant shrinkage of all coefficients • Sensitivity analyses address changes to results with different prior variances • Data contain information on prior variance
2: A richer parametric model (P2) • Places prior distribution on ϕ2 • reduces dependence on prior variance • Could place prior on μ as well (in some situations)
Properties of model P2 • Prior distribution on ϕ2 allows it to be updated by the data • As variability of estimates from prior mean increases, so does ϕ2 • As variability of estimates from prior mean decreases, so does ϕ2 • Adaptive shrinkage of all coefficients
Posterior Sampling for Model P2 In the simple Normal-Normal setting, So the conditional posteriors are: Where:
The Problem with Model P2 • How sure are we of our parametric specification of the prior? • Can we do better by grouping coefficients into clustering and then shrinking the cluster specific coefficients separately? • Amount of shrinkage varies by coefficient
3: Dirichlet Process Priors • Popular Bayesian non-parametric approach • Rather than specifying that βj~N(μ,ϕ2), we specify βj~D • D is an unknown distribution • D needs a prior distribution: D~DP(λ,D0) • D0 is a base distribution such as N(μ,ϕ2) • λ is a precision parameter. As λ gets large, D converges to D0
Dirichlet Process Prior An extension of the finite mixture model David presented last week: As k becomes infinitely large, this specification becomes equivalent to a DPP
Equivalent Representations of DPP Stick Breaking Representation Polya Urn Representation: Where P is the number of coefficients and D0 ~N-Inv.Gamma
Realizations from a Dirichlet Process λ=1 λ=100 D0=N(0,1)
Dirichlet Process Prior • Discrete nature of DPP implies clustering • Probability of clustering increases as λ decreases • In this application, we want to cluster coefficients • Soft clustering: coefficients are clustered at each iteration of the Gibbs sampler, not assumed to be clustered together with certainty
Dirichlet Process Prior prior for β1 for a given D0 and β2 to β10
Posterior Inference for DPP Use Polya Urn Scheme: where
Posterior Inference for DPP • Coefficients are assigned to clusters based on the weights, w0j to wop. • After assignment, cluster specific coefficients are updated to improve mixing • DPP precision parameter can be random as well
4: Dirichlet Process Prior with Variable Selection Models • Minor modification to Dirichlet process prior model • We may desire a more parsimonious model • If some DBPs have no effect, would prefer to eliminate them from the model • forward/backward selection • result in inappropriate confidence intervals
Dirichlet Process Prior with Variable Selection Models • We incorporate a selection model in the Dirichlet Process’s base distribution: • π is the probability that a coefficient has no effect • (1- π) is the probability that it is N(μ,ϕ2)
Dirichlet Process with Variable Selection • A coefficient is equal to zero (no effect) with probability π • A priori, we expect this to happen (π100)% of the time • We place a prior distribution on π to allow the data to guide inference
Posterior Inference • Gibbs sampling proceeds as in previous model, except weights are modified. • Additional weight for null cluster
Dirichlet Process Prior withVariable Selection prior for β1 for a given D0 and β2 to β10
Simulations • Four hierarchical models, how do they compare? • The increased complexity of these hierarchical models seems to make sense, but what does it gain us? • Simulated datasets of size n=500
Example: Spontaneous Abortion and Disinfection By-Products • Pregnancy loss prior to 20 weeks of gestation • Very common (>30% of all pregnancies) • Relatively little known about its causes • maternal age, smoking, prior pregnancy loss, occupational exposures, caffeine • disinfection by-products (DBPs)
Disinfection By-Products (DBPs) • A vast array of DBPs are formed in the disinfection process • We focus on 2 main types: • trihalomethanes (THMs): CHCl3, CHBr3, CHCl2Br, CHClBr2 • haloacetic acids (HAAs): ClAA, Cl2AA, Cl3AA, BrAA, Br2AA, Br3AA, BrClAA, Br2ClAA, BrCl2AA
Specific Aim • To estimate the effect of each of the 13 constituent DBPs (4 THMs and 9 HAAs) on SAB • The Problem: DBPs are very highly correlated • for example: • ρ=0.91 between Cl2AA and Cl3AA
Right From the Start • Enrolled 2507 women from three metropolitan areas in US • 2001-2004 • Recruitment: • Prenatal care practices (52%) • Health department (32%) • Promotional mailings (3%) • Drug stores, referral, etc (13%)
Preliminary Analysis • Discrete time hazard model including all 13 DBPs (categorized into 32 coefficients) • time to event: gestational weeks until loss • α’s are week specific intercepts (weeks 5…20) • z’s are confounders: smoking, alcohol use, ethnicity, maternal age • xkij is the concentration of kth category of DBP for the ith individual in the jth week
Results of Logistic Regression • Several large but imprecise effects are seen • 4 of 32 coefficients are statistically significant • Imprecision makes us question results • better analytic approach
DBPs and SAB: model P1 Little prior evidence of effect: specify μ=0 Calculate ϕ2 from existing literature largest effect: OR=3.0 ϕ2=(ln(3.0)-ln(1/3))/(2 x 1.96)=0.3142
Semi-Bayes Results Red=ML Estimates Black= SB Estimates
DBPs and SAB: Model P2 • μ=0 • ϕ2 is random. Choose α1=3.39 α2=1.33 • E(ϕ2)=0.31 (as in Semi-Bayes analysis) • V(ϕ2)=0.07 (at ϕ2’s 95th percentile, 95% of β’s will fall between OR=6 and OR=1/6…the most extreme we believe to be possible)
Fully-Bayes Results Red=ML & semi-Bayes Black=fully-Bayes
DBP and SAB: Dirichlet Process Priors μ=0, α1=3.39 ,α2= 1.33 ν1= 1 ν2=1 uninformative choice for λ
DBPs and SAB: Dirichlet Process Priors with Selection Component • μ=0, α1=3.39 ,α2= 1.33,ν1= 1 ν2=1 • ω1=1.5, ω2=1.5 E(π)=0.5, 95%CI(0.01, 0.99)
Conclusions (Hierarchical Models) • Semi-Bayes: Assumes β random • Fully-Bayes: Assumes ϕ2 random • Dirichlet Process: Assumes prior distribution is random • Dirichlet Process with Selection Component: Assumes prior distribution is random and allows coefficients to cluster at the null • Can improve performance (MSE) with increasing complexity