370 likes | 494 Views
Longitudinal Data & Mixed Effects Models. Danielle J. Harvey UC Davis. Disclaimer. Funding for this conference was made possible, in part by Grant R13 AG030995 from the National Institute on Aging.
E N D
Longitudinal Data & Mixed Effects Models Danielle J. Harvey UC Davis
Disclaimer • Funding for this conference was made possible, in part by Grant R13 AG030995 from the National Institute on Aging. • The views expressed do not necessarily reflect the official policies of the Department of Health and Human Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government. • Dr. Harvey has no conflicts of interest to report.
Outline • Intro to longitudinal data • Notation • General model formulation • Random effects • Assumptions • Example • Interpretation of coefficients • Model diagnostics
Longitudinal data features • Three or more waves of data on each unit/person (some with two waves okay). • Outcome values. • Preferably continuous (although categorical outcomes are possible); • Systematically change over time; • Metric, validity and precision of the outcome must be preserved across time. • Sensible metric for clocking time. • Automobile study: months since purchase, miles, or number of oil changes?
Data Format • Person-level (multivariate or wide format): • One line/record for each person which contains the data for all assessments • Person-period (univariate or long format): • One line/record for each assessment • Person-period data format is usually preferable: • Contains time and predictors at each occasion; • More efficient format for unbalanced data.
Exploring longitudinal data • Empirical growth plots. • If too many, select a random sample. • Reveal how each person changes over time. • Smoothing techniques for trends: • Nonparametric: moving averages, splines, lowess and kernel smoothers. • Examine intra- and inter-individual differences in the outcome. • Gather ideas about functional form of change.
Exploring longitudinal data (cont) • More formally: use OLS regression methods. • Estimate within-person regressions. • Record summary statistics (OLS parameter estimates, their standard errors, R2). • Evaluate the fit for each person. • Examine summary statistics across individuals (obtain their sample means and variances). • Known biases: sample variance of estimated slopes > population variance in the rate of change.
Exploring longitudinal data (cont) • To explore effects of categorical predictors: • Group individual plots. • Examine smoothed individual growth trajectories for groups. • Examine relationship between OLS parameter estimates and categorical predictors.
Selected References • Singer, J. D., & Willet, J. B. (2003) Applied Longitudinal Data Analysis, Oxford University Press. • Diggle,P. J., Heagerty, P., Liang, Kung-Yee, & Zeger, S. L. (2002). Analysis of Longitudinal Data, Oxford University Press. • Weiss, R. (2005) Modeling Longitudinal Data, Springer.
Random Effects Models - Notation • Let Yij = outcome for ith person at the jth time point • Let Y be a vector of all outcomes for all subjects • X is a matrix of independent variables (such as baseline diagnosis and time) • Z is a matrix associated with random effects (typically includes a column of 1s and time)
Mixed Model Formulation • Y = X + Z + • are the “fixed effect” parameters • Similar to the coefficients in a regression model • Coefficients tell us how variables are related to baseline level and change over time in the outcome • are the “random effects”, ~N(0,) • are the errors, ~N(0,2)
Random Effects • Why use them? • Not everybody responds the same way (even people with similar demographic and clinical information respond differently) • Want to allow for random differences in baseline level and rate of change that remain unexplained by the covariates
Random Effects Cont. • Way to think about them • Two bins with numbers in them • Every person draws a number from each bin and carries those numbers with them • Predicted baseline level and change based on “fixed effects” adjusted according to a person’s random number
Random Effects Cont. • Accounts for correlation in observations • Correlation structures • Compound symmetry (common within-individual correlation) • Autoregressive - AR(1) (each assessment most strongly correlated with previous one) • Unstructured (most flexible)
Assumptions of Model • Linearity • Homoscedasticity (constant variance) • Errors are normally distributed • Random effects are normally distributed • Typically assume MAR
Interpretation of parameter estimates • Main effects • Continuous variable: average association of one unit change in the independent variable with the baseline level of the outcome • Categorical variable: how baseline level of outcome compares to “reference” category • Time • Average annual change in the outcome for “reference individual” • Interactions with time • How annual change varies by one unit change in an independent variable • Covariance parameters
Graphical Tools for Checking Assumptions • Scatter plot • Plot one variable against another one (such as random slope vs. random intercept) • E.g. Residual plot • Scatter plot of residuals vs. fitted values or a particular independent variable • Quantile-Quantile plot (QQ plot) • Plots quantiles of the data against quantiles from a specific distribution (normal distribution for us)
Residual Plot Ideal Residual Plot - “cloud” of points - no pattern - evenly distributed about zero
Non-linear relationship • Residual plot shows a non-linear pattern (in this case, a quadratic pattern) • Best to determine which independent variable has this relationship then include the square of that variable into the model
Non-constant variance • Residual plot exhibits a “funnel-like” pattern • Residuals are further from the zero line as you move along the fitted values • Typically suggests transforming the outcome variable (ln transform is most common)
Example • Back to some data • Interested in differences in change between diagnostic groups • Outcomes = episodic memory and working memory • X includes diagnostic group (control = reference group) and time • Incorporate a random intercept and slope, with unstructured covariance (allows for correlation between the random effects)
Advanced topics • Time-varying covariates • Simultaneous growth models (modeling two types of longitudinal outcomes together) • Allows you to directly compare associations of specific independent variables with the different outcomes • Allows you to estimate the correlation between change in the two processes