400 likes | 484 Views
Generalized Estimating Equations (GEEs). Purpose: to introduce GEEs These are used to model correlated data from Longitudinal/ repeated measures studies Clustered/ multilevel studies. Outline. Examples of correlated data Successive generalizations Normal linear model
E N D
Generalized Estimating Equations (GEEs) Purpose: to introduce GEEs These are used to model correlated data from Longitudinal/ repeated measures studies Clustered/ multilevel studies
Outline • Examples of correlated data • Successive generalizations • Normal linear model • Generalized linear model • GEE • Estimation • Example: stroke data • exploratory analysis • modelling
Treatment groups Measurement times A Subjects, i = 1,…,n B C Randomize Correlated data • Repeated measures: same subjects, same measure, successive times – expect successive measurements to be correlated
Correlated data Level 3 • Clustered/multilevel studies Level 2 Level 1 E.g., Level 3: populations Level 2: age - sex groups Level 1: blood pressure measurements in sample of people in each age - sex group We expect correlations within populations and within age-sex groups due to genetic, environmental and measurement effects
Notation • Repeated measurements: yij,i = 1,… N, subjects; j = 1, … ni, times for subject i • Clustered data: yij, i = 1,… N, clusters; j = 1, … ni, measurements within cluster i • Use “unit” for subject or cluster
Normal Linear Model For unit i: E(yi)=i=Xi; yi~N(i, Vi) Xi: nip design matrix : p1 parameter vector Vi: nini variance-covariance matrix, e.g., Vi=2I if measurements are independent For all units: E(y)==X, y~N(,V) This V is suitable if the units are independent
Normal linear model: estimation We want to estimate and V Use Solve this set of score equations to estimate
Generalized estimating equations Di is the matrix of derivatives i/j Vi is the ‘working’ covariance matrix of Yi Ai=diag{var(Yik)}, Ri is the correlation matrix for Yi is an overdispersion parameter
Estimated using the formula: Overdispersion parameter Where N is the total number of measurements and p is the number of regression parameters The square root of the overdispersion parameter is called the scale parameter
Estimation (1) • More generally, unless Vi is known, need iteration to solve • Guess Vi and estimate by b and hence • Calculate residuals, rij=yij-ij • Estimate Vi from the residuals • Re-estimate b using the new estimate of Vi • Repeat steps 2-4 until convergence
Start with Ri=identity (ie independence) and =1: estimate Use estimates to calculated fitted values: And residuals: These are used to estimate Ai, Ri and Then the GEE’s are solved again to obtain improved estimates of Iterative process for GEE’s
Correlation For unit i For repeated measures = correl between times l and m For clustered data = correl between measures l and m For all models considered here Vi is assumed to be same for all units
Types of correlation • Independent: Vi is diagonal • 2. Exchangeable: All measurements on the same unit are equally correlated • Plausible for clustered data • Other terms: spherical and compound symmetry
Types of correlation 3. Correlation depends on time or distance between measurements l and m e.g. first order auto-regressive model has terms , 2, 3 and so on Plausible for repeated measures where correlation is known to decline over time 4.Unstructured correlation:no assumptions about the correlations Lots of parameters to estimate – may not converge
Missing Data For missing data, can estimate the working correlation using the all available pairs method, in which all non-missing pairs of data are used in the estimators of the working correlation parameters.
Choosing the Best Model Standard Regression (GLM) AIC = - 2*log likelihood + 2*(#parameters) • Values closer to zero indicate better fit and greater parsimony.
Choosing the Best Model GEE QIC(V) – function of V, so can use to choose best correlation structure. QICu – measure that can be used to determine the best subsets of covariates for a particular model. the best model is the one with the smallest value!
Other approaches – alternatives to GEEs • Multivariate modelling – treat all measurements on same unit as dependent variables (even though they are measurements of the same variable) and model them simultaneously • (Hand and Crowder, 1996) • e.g., SPSS uses this approach (with exchangeable correlation) for repeated measures ANOVA
Other approaches – alternatives to GEEs • Mixed models – fixed and random effects • e.g., y = X + Zu + e • : fixed effects; u: random effects ~ N(0,G) • e: error terms ~ N(0,R) • var(y)=ZGTZT + R • so correlation between the elements of y is due to random effects Verbeke and Molenberghs (1997)
Example of correlation from random effects Cluster sampling – randomly select areas (PSUs) then households within areas Yij = + ui + eij Yij : income of household j in area i : average income for population ui : is random effect of area i ~ N(0, ); eij: error ~ N(0, ) E(Yij) = ; var(Yij) = ; cov(Yij,Ykm)= , provided i=k, cov(Yij,Ykm)=0, otherwise. So Vi is exchangeable with elements: =ICC (ICC: intraclass correlation coefficient)
Numerical example: Recovery from stroke Treatment groups A = new OT intervention B = special stroke unit, same hospital C= usual care in different hospital 8 patients per group Measurements of functional ability – Barthel index measured weekly for 8 weeks Yijk : patients i, groups j, times k • Exploratory analyses – plots • Naïve analyses • Modelling
Numerical example: time plots Individual patients and overall regression line
Numerical example: research questions • Primary question: do slopes differ (i.e. do treatments have different effects)? • Secondary question: do intercepts differ (i.e. are groups same initially)?
Numerical example Correlation matrix
Numerical example1. Pooled analysis ignoring correlation within patients
Numerical example 2. Repeated measures analyses using various variance-covariance structures For the stroke data, from scatter plot matrix and correlations, an auto-regressive structure (e.g. AR(1)) seems most appropriate Use GEEs to fit models
Numerical example 4. Mixed/Random effects model • Use model • Yijk = (j + aij) + (j + bij)k + eijk • j and j are fixed effects for groups • other effects are random • and all are independent • Fit model and use estimates of fixed effects to compare j’s and j’s
Numerical example: Results for intercepts Results from Stata 8
Numerical example: Results for intercepts Results from Stata 8
Numerical example: Results for intercepts Results from Stata 8
Numerical example: Results for slopes Results from Stata 8
Numerical example: Results for slopes Results from Stata 8
Numerical example: Results for slopes Results from Stata 8
Numerical example: Summary of results • All models produced similar results leading to the same conclusion – no treatment differences • Pooled analysis and data reduction are useful for exploratory analysis – easy to follow, give good approximations for estimates but variances may be inaccurate • Random effects models give very similar results to GEEs • don’t need to specify variance-covariance matrix • model specification may/may not be more natural