560 likes | 754 Views
APHEO: Longitudinal Data Analysis; Overview and Conceptual Ideas October 15, 2007. Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa. Overview. What is longitudinal analysis? Data structure Graphical approaches Analytical methods Multi-level model
E N D
APHEO:Longitudinal Data Analysis; Overview and Conceptual IdeasOctober 15, 2007 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa
Overview • What is longitudinal analysis? • Data structure • Graphical approaches • Analytical methods • Multi-level model • Growth curves • Emphasis on examples, not mathematics
Intro (1) • Poverty in Canada • Survey of Labour and Income Dynamics (SLID) • Statistics Canada (1993-1996)
Intro(2) • Poverty percent seems fairly constant • Slight increase but always around 12% • Interpretation • There is a group of people in Canada who live in constant poverty • There are many people who live in poverty for parts of their lives. • Very different policy implications
Intro(3) • How to tell these situations apart? • Need within-subject data over multiple years (time points) • Panel survey • cohort study • Etc.
Intro (4) • SLID is a panel study. • Can estimate the percent of people with 0,1,2,3 or 4 years of ‘exposure’ • If the same people are always in poverty, we’d except to see: • 0 years: 88% • 4 years: 12% • If the poverty group turned over every year, we’d expect to see: • 0 years: 52% • 1 year: 48%
Intro(5) • Results
Intro(6) • Panel data shows • Poverty affects more people than suggested by the cross-sectional prevalence • Few people are exposed to constant poverty • Suggests need for policy interventions to keep people out of poverty more than to get them out of poverty
Design (1) • Some other longitudinal studies • NDIT • 2-4 monthly examination of teens from age 12 to 18 • VA Normative Aging Study • Up to 9 measures from 1978 to 1999 • Framingham Study • Biennial measures over about 50 years • Nurses Health Study • Biennial measures x 20+ years • Studies of HIV seroconversion
Design (2) • Key features • Multiple (repeat) measures on the same subjects • Three or more ‘waves’ of measurement are required • Looks at within-subject changes, not group changes. • Need a ‘metric’ for time. • Can be pre-fixed or allowed to vary across subjects • Main focus is on within-subject change and determinants thereof. • Outcome must be something which can change • Death is a poor outcome choice • Use survival analysis methods • Can be continuous or categorical
Design (3) • Data structure • Normal approach for data is to define one record per subject (Broad/person-level) • If subjects have measures at more than one time, define two or more variables: • Smoke1, smoke2, smok3 • Longitudinal analysis needs a different layout: Long/person-period • One record per time point per subject.
Design (4) Broad/person-level layout id time1 time2 time3 time4 1 31 29 15 26 2 24 28 20 32 3 14 20 28 30 4 38 34 30 34 5 25 29 25 29 6 30 28 16 34
Design (5) Long/Person-period layout • id time score • 1 1 31 • 1 2 29 • 1 3 15 • 4 26 • 2 1 24 • 2 28 • 2 3 20 • 2 4 32 • 3 1 14 • 3 2 20 • 3 3 28 • 3 4 30 • id time score • 4 1 38 • 4 2 34 • 4 3 30 • 4 4 34 • 5 1 25 • 2 29 • 5 3 25 • 4 29 • 6 1 30 • 6 2 28 • 6 3 16 • 6 4 34
Analysis approaches • Exploratory data analyses are common • Graphical methods • Trajectories • Hypothesis testing • Model building • Identification of ‘latent classes’ • Unknown groups which show similar trajectories within group but different among groups
Graphs (1) • Plot a ‘trajectory’ for each subject. • X-axis: time of measure • Y-axis: value of outcome at time point • Connect the ‘dots’ for each subject. • ‘Mean response’ • Compute the mean outcome at each time point • Connect the dots • Can also apply non-parametric smoothing methods (e.g. cubic splines)
Graphs (2) • Problems arise when there are many subjects and time points • Next example shows CD4+ cell numbers over 8 years of follow-up around time of seroconversion. • 2,376 data points • 369 infected men • Some studies get much bigger than this
Graphs (3) • Very busy graph • Not too useful • ‘Except from the perspective of an ink manufacturer’ • Options • Fit mean response only (LOESS smooth) • Ignores within subject changes • Plot random sub-set of subjects • Arbitrary selection • Plot subjects based on percentile of overall response
Graphs(4): Group Comparisons id group time1 time2 time3 time4 1 A 31 29 15 26 2 A 24 28 20 32 3 A 14 20 28 30 4 B 38 34 30 34 5 B 25 29 25 29 6 B 30 28 16 34
Types of questions you might ask • What is the pattern of change over time? • Are there significant changes from baseline? • Do all people show the same trajectories? • If not, are there factors associated with different trajectories? • In a two group comparison, do the two groups differ in their responses over time? • Do the two groups differ at any time points?
Analytic Strategies (1) • T-test at final time point • ANCOVA at final time point • ANOVA using wave as factor • Could use multiple comparison methods (e.g. Tukey’s test) to look for wave specific differences. • MANOVA • Regression models (Ordinary Least Squares) • Uses the time point as an ‘x’ variable to model overall trajectory for study group • Ignores within subject change; looks at group mean changes • Can model non-linear predictors and multiple predictors.
Analytic Strategies (2) • Previous methods all suffer a serious weakness • They ignore the with-in subject change (except ANCOVA). • Assumes that each subject has the same trajectory • Ignores the correlation between measures on the same subject at different time points • Variance estimates and significance are biased. • Tests at final time point ignore intermediate results which could be of value in programme design.
Analytic Strategies (3) • Change scores • Repeated Measures ANOVA • Random coefficient models • Multilevel Model of Change • Growth Models • Latent class growth models
Analytic Strategies (4) • Simple within subject trajectory analysis • Perform separate OLS regression models for each subject • For linear model, gives one intercept and one slope for each subject. • If subjects follow different trajectories, than the distribution of the betas will show a pattern
Multilevel model of change (1) • Standard OLS regression model is: • Betas are assumed to be fixed. That is, there is a single common trajectory for ALL subjects. • An departure from this predicted/expected trajectory is due to random variation (or unmeasured predictors) and shows up in the error term.
Multilevel model of change (2) • In my previous example, betas are NOT the same for each subject. • If we had a predictor (e.g. sex), we could model it directly by adding a sex term (or a time*sex interaction if the slope varied). • Predictors may be unknown. • Multilevel model of change allows the betas to be RANDOM VARIABLES, not fixed. • Allows modeling of measured predictors
Multilevel model of change (3) • Level 1 model (similar to OLS model) • Coefficients have a new sub-script: subject (i). • Each subject is allowed a different set of ‘betas’ (π’s). • The π’s are the ‘true’ intercept and slope for subject ‘i’ • For the entire sample (or population), the π’s will vary, showing some distribution. • External factors (e.g. sex) may affect the distribution. • We can model this explicitly
Multilevel model of change (4) • Level 2 model (models the coefficients at level 1) • The ζ’s follows a bivariate normal distribution with means ‘0’. • These models can be made more complex • Add additional functions of time for the trajectory • Add additional level 2 predictors.
Multilevel model of change (5) • Model can be fit using • Multilevel software package • Mixed-model methods • Output includes the ‘normal’ fixed effect model (the ‘best’ trajectory fitting all of the data) • Estimates of the extent of variation in the random component • Large values suggest the need for more predictors • Variances, etc. reflect within subject effects and correlations • Can model correlation more explicitly (GEE methods) • Does not require time points to be evenly spaced
Latent Class Growth Models • Similar modeling approaches as multi-level methods • Assumes that there are distinct ‘classes’ of trajectories but we don’t know which people belong to which class (or why) • Fits the ‘best’ model with the pre-specified number of classes • Explore classes for potential predictors, differences, etc.
Application (1) • Smoking onset in teens • McGill Natural History of Nicotine Dependence (NDIT) • PI: Dr. J. O’Loughlin • Published in Ann Epidemiol, 2005 • 1,293 grade 7 students • 369 ‘novice’ smokers with 3+ measures • Followed every 3-4 months for 3.5 years • Main outcome: smoking intensity • Estimate of # of cigarettes smoked in previous 3 months
Application (2) • Analysis • Step 1: multi-level model of change • Used a quadratic trajectory model (time and time2) • Mean smoking curve • 18 cigarettes per month in the first three months after onset • Increased by 13.3 cigs/month every three months • Strongly significant random effects • Evidence for strong heterogeneity in the trajectories
‘Fixed effect’ trajectory – all subjects 1 ‘survey’ = about 3 months
Application (3) • Step 2: • Explored a range of latent class models. • Best fit found with four latent groups • Graphs on next slide. Note that the proportion of sample in each class varied greatly: • Class 1: 73% • Class 2: 11% • Class 3: 11% • Class 4: 6%
‘Latent class’ trajectories Y-axis is reaches 750, not 200 Class 4 Class 3 Class 2 Class 1 1 ‘survey’ = about 3 months
Application (4) • Evidence of different characteristics among classes • Parents smoke: OR=2.2 (class 4) • Poor marks: OR > 4 (classes 3 & 4) • 50+% friends smoke: OR= 6 (class 3) OR=12 (class 4) • Clear smoking rules: OR=0.3 (class 3) OR=0.6 (class 4)