500 likes | 961 Views
Generalized Estimating Equations (GEE): A Modern Love Story. April 18, 2011 D α SAL Brandi Stupica. Data for today on the H: drive in the DaSAL folder GEE Talk Data_041811.sav. What are generalized estimating equations? Applications Why you should love GEEs. PART I.
E N D
Generalized Estimating Equations (GEE): A Modern Love Story April 18, 2011 DαSAL Brandi Stupica Data for today on the H: drive in the DaSAL folder GEE Talk Data_041811.sav
What are generalized estimating equations? Applications Why you should love GEEs PART I.
What are Generalized Estimating Equations (GEE)? • Extension of the Generalized Linear Model (GZLM), which is an extension of the General Linear Model (GLM) • GLM analyzes models with normally distributed DVs that are linearly linked to predictors • GZLM extends GLM to analyze non-normally distributed DVs that may be non-linearly linked to predictors • Easily handles interactions between discrete and continuous IVs • Cannot analyze correlated, non-independent, clustered, nested, repeated measures, within-subjects data • GEE extends GZLM and analyzes correlated data with • Normal and non-normal DVs • DVs that are linearly or non-linearly linked to IVs • Full factorial models with any combo of discrete and continuous IVs
Application of GEE • Nested data • Dyadic relationships • Family studies • School and organizational studies • Repeated measures • Longitudinal data analysis • Within subjects designs • Pre/post designs
Why You Should Love GEEs for Correlated Data • Compared to rANOVA • Doesn’t assume DV is normal or that it is linearly linked to predictors • Can model DVs that are binomial, multinomial, Poisson, negative binomial, and more! • Can model interactions between factors and covariates with ease • Compared to Linear Mixed Models • Doesn’t require that repeated responses have multivariate normal distribution • Unlikely to meet this assumption when DV is binary or count data • Rather than combining multiple assessments, analyze with improved power by including within Ss factor • Uses all available data as default rather than complete cases only • Extraordinary flexibility can streamline results sections
Conducting a GEE Part ii.
Conducting a GEE: First Step A. How data usually look B. How data need to look for GEE • Arrange your data in “long form”
Selecting the Model Type • Dozens of model combinations with GEE • DV can be discrete, any of several distributions, and nonlinearly linked to IVs • Must select distribution of DV and link function
Response Variable • Also known as outcome variable, DV • Category order is for multinomial DVs • For binary outcomes, can specify reference category
Predictors • Options for factors allows specification of reference category and how to handle missing data
Model • Full factorial is a few clicks away
EM Means • Several options for controlling for family-wise error • Several options for contrasts, including • Simple • Pairwise • Deviation • Difference
Working Correlation Matrix • What is a working correlation matrix? • Correlated data could be correlated many ways • Specify in the beginning the assumptions that should be made about how correlated data are correlated • “Working” comes from the structure being re-estimated at each iteration • GEE robust to misspecification • Then, why bother picking the best one? • Small gain in efficiency by selecting correct underlying structure • In the “Repeated” tab I picked Unstructured correlation matrix • Why?
Working Correlation Matrix Options • Unstructured • No assumption about relative magnitude of the correlation between any two pairs of observations • Must estimate many parameters • Most efficient and conservative but can lead to poor estimates with small samples • Independent • Assumes measurements for the repeated measure uncorrelated • Default in SPSS • 1’s on the diagonal and 0's off the diagonal • Signifies variables correlated with themselves at any given time but not correlated with measurements at other times • Illogical assumption and often wrong given that data are correlated and non-independent by nature!!! • Thus, I always start with something other than independent, and choose unstructured because most conservative, efficient, and makes no assumptions • AR(1): Auto-regressive, order 1 • Correlation diminishes exponentially over-time • Assumes equal time intervals • 1's on the diagonal; alpha for observations one apart; alpha-squared for two apart; alpha-cubed for three apart , and so on • Exchangeable • Correlation does not change with time • Correlations for within-subjects variables homogenous, • 1's on the diagonal and equal correlation for all off-diagonal elements • M-dependent • Correlation does not change with time until time M, when it drops to zero • 1’s on the diagonal and 0 for observations separated by some number M or more and equal correlation for responses separated by less than M time points • Researcher specifies M
Choosing the Best Fitting Working Correlation Matrix • Run model for different working correlation structure assumptions, choose the one assumption with the lowest QIC value • But, wait…What is the QICC?
Bonus! Choosing the Best Subset of Predictors • QICC used for choosing best subset of predictors • Penalizes for model complexity • Run a model and a nested model dropping one of the predictors, then compare QICC coefficients • Lower QICC indicates better fit
Continuous Normal DV • Example if time
More Information on GEEs • Hardin, J. W., & Hilbe, J. M. (2003). Generalized estimating equations. Boca Raton, FL: Chapman and Hall/CRC Press. • Norusis, M. (2011). IBM SPSS Statistics 19 Advanced Statistical Procedures Companion. Upper Saddle River, NJ: Pearson. • http://faculty.chass.ncsu.edu/garson/PA765/gzlm_gee.htm