280 likes | 439 Views
Impact of the Distributional Assumptions of Random Effects on Model Fitting. Xueying (Sherri) Zhang Research Scientist California Department of Health Services CDIC/ Tobacco control section. Overview. Introduction Data Source and Study Population
E N D
Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services CDIC/ Tobacco control section
Overview • Introduction • Data Source and Study Population • Methods: Model Description and Simulations • Results • Discussion and Conclusions
Introduction Random Effect model: Logit [E (Yij=1| bi)] = ß0 + ß1Xij1+ ß2Xij2 + ß2Xij3+bi Distribution of the random effects
Introduction Research Questions: • What is the impact of this normality assumption on the estimates of fixed effects? • When a cluster- level confounder is omitted from the model and the random effects are associated with the covariates in the model, are the estimates of RE model correct?
Data Source and Study Population • Regional Rural Injury Study-II(RRIS-II) was a population-based, prospective cohort study, which was designed to investigate the incidence and consequences of agricultural injury in the five state region of Minnesota, Wisconsin, North Dakota, South Dakota, and Nebraska in 1999. • 3,765 household, including 16,538 persons, participated in the study. • We modeled the probability of agricultural activity-related injury (Yes/No). Gender, prior injury, and working hours per week on the agricultural operation were chosen as the covariates. • Clustered binary data---the same operations and the potential similarity of behaviors between parents and children.
Model Description Generalized linear mixed models (GzLMM) with a random intercept is expressed as: Logit [E (Yij=1| bi)] = ß0 + ß1Xij1+ ß2Xij2 +ß3Xij3+bi Yij indicates whether the agricultural activity-related injury happened or not in 1999 for the jth person in the ith family bi is the random effect for the ith family Xijk indicates gender, age, education, marital status, prior injury, working hours on the farm and the percentage of prior injury within the family (PPI).
Methods Random effect model : The marginal likelihood for Bernoulli data is as follows: Where Pij =E(Yij | bi) and with a logit link: bi ~ N(0, σ2)
Methods Conditional model: Logit [E (Yij=1| bi)] = ß’0+ ß’1Xij1+ ß’2Xij2 + ß’3Xij3+bi Given the sufficient statistics for bi, Sij, the conditional likelihood can be expressed as: L=
Methods Marginal model: • Marginal model only takes the fixed effects into account, the model is: Logit [E (Yij=1| bi)] = ß*0+ ß*1Xij1+ ß*2Xij2 + ß*3Xij3 Var(Yij)=φVar(E(Yij)) • It estimates β* by solving a quasi score function:
Simulations---(1) • True model for the first simulation: Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj +bi • Random effects: nbi=sqrt (σ2/v2)*(bi-µ) nbi~ ( 0, σ2 ) σ2 the estimated variance for random effects from the true model. v2 the variance of the predicted bi from the true model. µ the mean of estimated bi from the true model. bi the predicted random effects from the true model. • Pij= Exp (Xijß+nbi)/(1+ Exp(Xijß+nbi))
Simulations---(1) 1000 different seeds Random numbers (R) from U(0,1) for each individual If R≤ Pij, SimY=1; else, SimY=0 Pr (simY) = Pij Covariates remain the same with the real data a marginal model, RE model and conditional model was fit for each data set.
Simulations---(2) • True model for the second simulation: Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj + ß4 *PPI + bi PPI-Percentage of prior injury within family. • Random effects: nbi=sqrt (σ2/v2)*(bi-µ) nbi~ ( 0, σ2 ) σ2 the estimated variance for random effects from the true model. v2 the variance of the predicted bi from the true model. µ the mean of estimated bi from the true model. bi the predicted random effects from the true model. • Pij= Exp (Xijß+nbi)/(1+ Exp(Xijß+nbi))
Simulations---(2) 1000 different seeds Random numbers (R) from U(0,1) If R≤ Pij, SimY=1; else, SimY=0 Pr (simY) = Pij Covariates remain the same with the real data A marginal model, RE model and conditional model was fit for each data set.
Results--Simulation (1) of the model without PPI: The true model for the simulation is the RE model: Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj +bi • The average estimates of the RE model are closer to those of the conditional model. • The bias for prior injury in the RE model is 0.1130, much larger than the estimates from the marginal model and conditional model: 0.0573 and -0.0036. • The MSE for prior injury from the RE model is 0.0182, which is much bigger than the MSE from the marginal model and conditional models: 0.0073 and 0.0089.
Results -Simulation of the Model with PPI: • Hypotheses for incorrect estimates for prior injury in the RE model: an important cluster level confounder related to prior injury was omitted from the model. • The random effects were significantly associated with prior injury (ß =0.0162, p=0.0021). • After PPI was included in the model, the random effects were independent of prior injury (ß=0.0027, p=0.6101) and PPI (ß=0.0084, p=0.3668).
Results -Simulation of the Model with PPI: • PPI is a confounder for the effect of prior injury because it is significant in the model (ß=0.5205, p<0.001) and also associated with prior injury. • True model for the second simulation: Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj + ß4 *PPI + bi
Results -Simulation of the Model with PPI: • The average estimates of the RE model are almost equal to those of the conditional model. • The biases of the estimates for these three models are very similar to each other. • The mean squared errors (MSE) of the estimates of these three models are also similar.
Results -Simulation of the Model with PPI: • The percentage of C.I. coverage is higher than the corresponding confidence interval. For instance,the percentage of 80% C.I. coverage for working hours in the marginal model is 83.4%, higher than 80%. • The percentages of C.I. coverage are greatly improved, especially for the prior injury in the RE model. For instance, only 48.3% of the 80% confidence intervals of the prior injury estimates cover the true value, but in Table 7, 83.5% of the 80% confidence intervals cover the true value.
Discussion and Conclusions • The fixed effects from the RE model are correct even in cases where the distribution of random effects does not follow the normal distribution. • The random effects should be independent of the covariates in the model.
Discussion and Conclusions • In this project, after we include PPI, the random effects were independent of all the covariates, the random effects still did not follow a normal distribution --- unknown or unmeasured variables may exist which affects the probability of injury within the family.
Discussion and Conclusions • One limitation of the project is that the true values for marginal model were not available. • For further study, the impacts of the normality assumption of random effects on the estimates of random effects are of interest.