1 / 56

APHEO: Longitudinal Data Analysis; Overview and Conceptual Ideas October 15, 2007

APHEO: Longitudinal Data Analysis; Overview and Conceptual Ideas October 15, 2007. Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa. Overview. What is longitudinal analysis? Data structure Graphical approaches Analytical methods Multi-level model

thimba
Download Presentation

APHEO: Longitudinal Data Analysis; Overview and Conceptual Ideas October 15, 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APHEO:Longitudinal Data Analysis; Overview and Conceptual IdeasOctober 15, 2007 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

  2. Overview • What is longitudinal analysis? • Data structure • Graphical approaches • Analytical methods • Multi-level model • Growth curves • Emphasis on examples, not mathematics

  3. Intro (1) • Poverty in Canada • Survey of Labour and Income Dynamics (SLID) • Statistics Canada (1993-1996)

  4. Intro(2) • Poverty percent seems fairly constant • Slight increase but always around 12% • Interpretation • There is a group of people in Canada who live in constant poverty • There are many people who live in poverty for parts of their lives. • Very different policy implications

  5. Intro(3) • How to tell these situations apart? • Need within-subject data over multiple years (time points) • Panel survey • cohort study • Etc.

  6. Intro (4) • SLID is a panel study. • Can estimate the percent of people with 0,1,2,3 or 4 years of ‘exposure’ • If the same people are always in poverty, we’d except to see: • 0 years: 88% • 4 years: 12% • If the poverty group turned over every year, we’d expect to see: • 0 years: 52% • 1 year: 48%

  7. Intro(5) • Results

  8. Intro(6) • Panel data shows • Poverty affects more people than suggested by the cross-sectional prevalence • Few people are exposed to constant poverty • Suggests need for policy interventions to keep people out of poverty more than to get them out of poverty

  9. Design (1) • Some other longitudinal studies • NDIT • 2-4 monthly examination of teens from age 12 to 18 • VA Normative Aging Study • Up to 9 measures from 1978 to 1999 • Framingham Study • Biennial measures over about 50 years • Nurses Health Study • Biennial measures x 20+ years • Studies of HIV seroconversion

  10. Design (2) • Key features • Multiple (repeat) measures on the same subjects • Three or more ‘waves’ of measurement are required • Looks at within-subject changes, not group changes. • Need a ‘metric’ for time. • Can be pre-fixed or allowed to vary across subjects • Main focus is on within-subject change and determinants thereof. • Outcome must be something which can change • Death is a poor outcome choice • Use survival analysis methods • Can be continuous or categorical

  11. Design (3) • Data structure • Normal approach for data is to define one record per subject (Broad/person-level) • If subjects have measures at more than one time, define two or more variables: • Smoke1, smoke2, smok3 • Longitudinal analysis needs a different layout: Long/person-period • One record per time point per subject.

  12. Design (4) Broad/person-level layout id time1 time2 time3 time4 1 31 29 15 26 2 24 28 20 32 3 14 20 28 30 4 38 34 30 34 5 25 29 25 29 6 30 28 16 34

  13. Design (5) Long/Person-period layout • id time score • 1 1 31 • 1 2 29 • 1 3 15 • 4 26 • 2 1 24 • 2 28 • 2 3 20 • 2 4 32 • 3 1 14 • 3 2 20 • 3 3 28 • 3 4 30 • id time score • 4 1 38 • 4 2 34 • 4 3 30 • 4 4 34 • 5 1 25 • 2 29 • 5 3 25 • 4 29 • 6 1 30 • 6 2 28 • 6 3 16 • 6 4 34

  14. Analysis approaches • Exploratory data analyses are common • Graphical methods • Trajectories • Hypothesis testing • Model building • Identification of ‘latent classes’ • Unknown groups which show similar trajectories within group but different among groups

  15. Graphs (1) • Plot a ‘trajectory’ for each subject. • X-axis: time of measure • Y-axis: value of outcome at time point • Connect the ‘dots’ for each subject. • ‘Mean response’ • Compute the mean outcome at each time point • Connect the dots • Can also apply non-parametric smoothing methods (e.g. cubic splines)

  16. Profile plots (use long form)

  17. Mean response plot

  18. Superimposed…

  19. smoothed

  20. smoothed

  21. Superimposed…

  22. Graphs (2) • Problems arise when there are many subjects and time points • Next example shows CD4+ cell numbers over 8 years of follow-up around time of seroconversion. • 2,376 data points • 369 infected men • Some studies get much bigger than this

  23. Graphs (3) • Very busy graph • Not too useful • ‘Except from the perspective of an ink manufacturer’  • Options • Fit mean response only (LOESS smooth) • Ignores within subject changes • Plot random sub-set of subjects • Arbitrary selection • Plot subjects based on percentile of overall response

  24. Y-axis is now ‘residuals’

  25. Graphs(4): Group Comparisons id group time1 time2 time3 time4 1 A 31 29 15 26 2 A 24 28 20 32 3 A 14 20 28 30 4 B 38 34 30 34 5 B 25 29 25 29 6 B 30 28 16 34

  26. Profile plots by group B A

  27. Mean plots by group B A

  28. Types of questions you might ask • What is the pattern of change over time? • Are there significant changes from baseline? • Do all people show the same trajectories? • If not, are there factors associated with different trajectories? • In a two group comparison, do the two groups differ in their responses over time? • Do the two groups differ at any time points?

  29. Analytic Strategies (1) • T-test at final time point • ANCOVA at final time point • ANOVA using wave as factor • Could use multiple comparison methods (e.g. Tukey’s test) to look for wave specific differences. • MANOVA • Regression models (Ordinary Least Squares) • Uses the time point as an ‘x’ variable to model overall trajectory for study group • Ignores within subject change; looks at group mean changes • Can model non-linear predictors and multiple predictors.

  30. Analytic Strategies (2) • Previous methods all suffer a serious weakness • They ignore the with-in subject change (except ANCOVA). • Assumes that each subject has the same trajectory • Ignores the correlation between measures on the same subject at different time points • Variance estimates and significance are biased. • Tests at final time point ignore intermediate results which could be of value in programme design.

  31. Analytic Strategies (3) • Change scores • Repeated Measures ANOVA • Random coefficient models • Multilevel Model of Change • Growth Models • Latent class growth models

  32. Analytic Strategies (4) • Simple within subject trajectory analysis • Perform separate OLS regression models for each subject • For linear model, gives one intercept and one slope for each subject. • If subjects follow different trajectories, than the distribution of the betas will show a pattern

  33. Subject-specific OLS models

  34. Multilevel model of change (1) • Standard OLS regression model is: • Betas are assumed to be fixed. That is, there is a single common trajectory for ALL subjects. • An departure from this predicted/expected trajectory is due to random variation (or unmeasured predictors) and shows up in the error term.

  35. Multilevel model of change (2) • In my previous example, betas are NOT the same for each subject. • If we had a predictor (e.g. sex), we could model it directly by adding a sex term (or a time*sex interaction if the slope varied). • Predictors may be unknown. • Multilevel model of change allows the betas to be RANDOM VARIABLES, not fixed. • Allows modeling of measured predictors

  36. Multilevel model of change (3) • Level 1 model (similar to OLS model) • Coefficients have a new sub-script: subject (i). • Each subject is allowed a different set of ‘betas’ (π’s). • The π’s are the ‘true’ intercept and slope for subject ‘i’ • For the entire sample (or population), the π’s will vary, showing some distribution. • External factors (e.g. sex) may affect the distribution. • We can model this explicitly

  37. Multilevel model of change (4) • Level 2 model (models the coefficients at level 1) • The ζ’s follows a bivariate normal distribution with means ‘0’. • These models can be made more complex • Add additional functions of time for the trajectory • Add additional level 2 predictors.

  38. Multilevel model of change (5) • Model can be fit using • Multilevel software package • Mixed-model methods • Output includes the ‘normal’ fixed effect model (the ‘best’ trajectory fitting all of the data) • Estimates of the extent of variation in the random component • Large values suggest the need for more predictors • Variances, etc. reflect within subject effects and correlations • Can model correlation more explicitly (GEE methods) • Does not require time points to be evenly spaced

  39. Latent Class Growth Models • Similar modeling approaches as multi-level methods • Assumes that there are distinct ‘classes’ of trajectories but we don’t know which people belong to which class (or why) • Fits the ‘best’ model with the pre-specified number of classes • Explore classes for potential predictors, differences, etc.

  40. Application (1) • Smoking onset in teens • McGill Natural History of Nicotine Dependence (NDIT) • PI: Dr. J. O’Loughlin • Published in Ann Epidemiol, 2005 • 1,293 grade 7 students • 369 ‘novice’ smokers with 3+ measures • Followed every 3-4 months for 3.5 years • Main outcome: smoking intensity • Estimate of # of cigarettes smoked in previous 3 months

  41. Application (2) • Analysis • Step 1: multi-level model of change • Used a quadratic trajectory model (time and time2) • Mean smoking curve • 18 cigarettes per month in the first three months after onset • Increased by 13.3 cigs/month every three months • Strongly significant random effects • Evidence for strong heterogeneity in the trajectories

  42. ‘Fixed effect’ trajectory – all subjects 1 ‘survey’ = about 3 months

  43. Application (3) • Step 2: • Explored a range of latent class models. • Best fit found with four latent groups • Graphs on next slide. Note that the proportion of sample in each class varied greatly: • Class 1: 73% • Class 2: 11% • Class 3: 11% • Class 4: 6%

  44. ‘Latent class’ trajectories Y-axis is reaches 750, not 200 Class 4 Class 3 Class 2 Class 1 1 ‘survey’ = about 3 months

  45. Application (4) • Evidence of different characteristics among classes • Parents smoke: OR=2.2 (class 4) • Poor marks: OR > 4 (classes 3 & 4) • 50+% friends smoke: OR= 6 (class 3) OR=12 (class 4) • Clear smoking rules: OR=0.3 (class 3) OR=0.6 (class 4)

More Related