390 likes | 1.03k Views
Statistical methods in longitudinal studies. Jouko Miettunen, PhD Department of Psychiatry University of Oulu e-mail: jouko.miettunen@oulu.fi. Topics of this presentation. Logistic regression analysis Survival analysis Analysis of variance Random regression analysis
E N D
Statistical methods in longitudinal studies Jouko Miettunen, PhD Department of Psychiatry University of Oulu e-mail: jouko.miettunen@oulu.fi
Topics of this presentation • Logistic regression analysis • Survival analysis • Analysis of variance • Random regression analysis • Structural equation modeling • Latent class analysis • Imputing missing data
Logistic regression analysis (1) • Most common modeling method to analyze confounders in epidemiology, especially in longitudinal studies • Outcome variable should be dichotomized (no/yes, healthy/sick) • Exposure variables can be both dichotomized or continuous
Variables in logistic regression • Include sociodemographic variables, e.g. sex, social class • Include previous known risk factors • Especially if statistically significant in the model • Do not include too many variables • Depends on data size and distribution of variables • Do not include intercorrelating variables
Example data set Northern Finland 1966 Birth Cohort • Women who were living in the provinces of Oulu and Lapland and were due to deliver during 1966 • N = 12,058 live births • N = 10,934 living 1997 in Finland • Data on biological, socio-economic and health conditionscollected prospectively from pregnancy up to the age of 35 years • Data from several registers and e.g. from large follow-ups at 14 and 31 years
Example question Northern Finland 1966 Birth Cohort • What predicts rehospitalization in psychoses? • N = 158 hospital treated cases • Exposure variables • sex • father’s social class (1980) • familial risk • onset age • length of first hospitalization • diagnosis (schz / other psychosis)
Categorical Variables Codings Parameter coding Frequency (1) (2) Fathers social class 1980 I,II 48 ,000 ,000 III,IV 96 1,000 ,000 V 14 ,000 1,000 Parent has psychotic dg 1972-2000 no 133 ,000 yes 25 1,000 Sex male 93 1,000 female 65 ,000 Diagnosis schizophrenia 108 1,000 other psych 50 ,000 Length of first hospitalization < 1 month 94 1,000 > 1 month 64 ,000 SPSS Output (1)
Variables in the Equation 95,0% C.I.for EXP(B) B S.E. Wald df Sig. Exp(B) Lower Upper 1,048 ,375 7,805 1 ,005 2,852 1,367 5,948 Length of 1st hospital.(1) Sex(1) -,559 ,366 2,331 1 ,127 ,572 ,279 1,172 Onset age -,047 ,043 1,199 1 ,274 ,954 ,876 1,038 Diagnosis(1) ,839 ,385 4,740 1 ,029 2,314 1,087 4,926 FSC 1980 ,651 2 ,722 FSC 1980(1) ,309 ,392 ,622 1 ,430 1,362 ,632 2,934 FSC 1980(2) ,109 ,647 ,028 1 ,866 1,115 ,314 3,960 Parental psych(1) ,612 ,513 1,423 1 ,233 1,845 ,675 5,045 Constant ,488 1,100 ,197 1 ,657 1,629 SPSS Output (2)
Survival analysis (1) • Examines time between two events, e.g. • from birth to illness onset • from illness onset to death • from end of treatment to rehospitalization • Kaplan-Meier model estimates probability of events in each time point
Survival analysis (2) • Required information • Event (0,1) • Time to event (days, months,…) or to censoring • Data is censored due to • End of follow-up time • Loss of contact • Or e.g. other dead than that of interest
Example question Northern Finland 1966 Birth Cohort • What predicts age of suicide? • People alive and living in Finland at 16 years (N=10,934) • Data till end of 2001 • 58 (0.5%) suicides • 140 (1.3%) other deaths • 10,736 (98.2%) alive • Predictor variable: • family type at birth (full, single)
SPSS Output (1) Test Statistics for Equality of Survival Distributions log rank test, p=0.002
Survival analysis (3) • Difference or trend in difference between groups should be about the same across time, at least curves shouldn’t cross (if statistically tested) • Can be done also with small samples • Curve can be presented as survival or as hazard function • References, e.g. • Parmar & Machin: Survival analysis. A practical approach. John Wiley & Sons, 1995.
Example question (2) Age of suicide and family type • Possible confounding variables • sex • social class 1966 (I-II,III-IV,V) • average school mark at 14 • psychiatric diagnosis (no, yes) • crime (no, violent, non-violent) • Cox regression analysis
Categorical Variable Codings Frequency (1) (2) Sex 1=male 5425 1 2=female 5222 0 Psych dg 0=healthy 10197 0 1=any dg 450 1 Fathers Social Class 1966 1=I,II 783 0 0 2=III,IV 7823 1 0 3=V 2041 0 1 Family type 1966 1=single 1975 1 2=full 8672 0 Criminality 0=no crimes 10019 0 0 1=violent 200 1 0 2=nonviolent 428 0 1 SPSS Output (3) Cox regression analysis
Variables in the Equation 95,0% CI for Exp(B) B SE Wald df Sig. Exp(B) Lower Upper SEX ,812 ,340 5,720 1 ,017 2,253 1,158 4,383 PSYCH DG 2,463 ,303 66,085 1 ,000 11,740 6,483 21,260 FAM TYPE ,728 ,287 6,429 1 ,011 2,072 1,180 3,637 FSC 1966 1,514 2 ,469 FSC 1966(1) ,451 ,715 ,398 1 ,528 1,570 ,386 6,377 FSC 1966(2) ,536 ,436 1,513 1 ,219 1,710 ,727 4,018 SCHOOL MARK -,276 ,164 2,825 1 ,093 ,759 ,550 1,047 CRIMES 3,398 2 ,183 CRIMES(1) ,239 ,454 ,276 1 ,600 1,269 ,521 3,093 CRIMES(2) -1,011 ,625 2,613 1 ,106 ,364 ,107 1,239 SPSS Output (4) Cox regression analysis
Analysis of variance • ANOVA • One continuous outcome (dependent) variable • MANOVA • Several continuous outcome variables • Repeated measurements ANOVA • Same measurements are made several times on each subject • ANOVA, MANOVA and rANOVA • Only categorical predictors • ANCOVA, MANCOVA, rANCOVA • Also continuous predictors
Example question Difference in size of hippocampus • Northern Finland 1966 Birth Cohort • Follow-up study 1999-2001 • Schizophrenia patients (N=56) vs. healthy controls (N=104) • Repeated measurements ANCOVA • Measurements of right and left side were thought as repeated measurements
Example table Schizophrenia and Comparison subjects Hippocampus volumes F Sig. Model 1 Within effect: side20.3 < 0.001 Diagnosis 1.2 0.28 Gender 6.5 0.01 Model 2 Within effect: side0.81 0.37 Covariate: brain vol. 35.0 < 0.001 Diagnosis < 0.01 0.89 Gender 0.7 0.41 Familial psychosis 1.9 0.17 Perinatal risk 0.8 0.38 Handedness 0.3 0.61 Tanskanen et al. Schizophrenia Research (in press)
Random regression analysis • Random regression analysis = Random-effects (multilevel) models = … • Allow presence of missing data • Allow time-varying covariates • Allow subjects measured at different timepoints • Takes into account several levels of subjects (multilevel analysis)
Random regression analysis • Available software • SAS Proc Mixed • Stata (GLLAMM) • Specific multilevel modeling software • MLWin • http://multilevel.ioe.ac.uk/index.html • HLM • http://www.ssicentral.com/hlm/hlm.htm
Random regression analysis • References • Goldstein et al. Tutorial in biostatistics. Multilevel modelling of medical data. Stat Med, 21, 3291-315, 2002. • Hedeker & Mermelstein. Application of random-effects regression models in relapse research, Addiction, 91, S211-30, 1996. • Sharma et al. A longitudinal study of plasma cortisol and depressive symptomatology by random regression analysis. Biol Psychiatry 31, 304-14, 1992. • Tilling et al. A new method for predicting recovery after stroke. Stroke 32, 2867-73, 2001. • Homepage of Don Hedeker: • http://tigger.uic.edu/~hedeker/ • Homepage of Sophia Rabe-Hesketh (GLLAMM) • http://www.gllamm.org/sophia.html
Structural Equation Modeling • Combination of factor analysis and regression • Continuous and discrete predictors and outcomes • Relationships among measured or latent variables
Orientation to learning nursing Orientation to nursing Caring orientation + Catalytic- co-operational nursing • male, p=.002 • older, p<.0001 • no children, p=.048 + + (r=.27) • older, p=.034 + (r=.27) + (r=.47) + (r=.64) + (r=.44) • Swedish, p<.0001 • older, p<.0001 • no children, p=.036 + (r=.19) + + Controlling nursing Expertise orientation • Swedish, p<.0001 • older, p0.002 + (r=.18) + (r=.22) + (r=.11) • Finnish, p=.020 • younger, p=.0003 • sairaanhoit, p=.020 • no children, p<.0001 + + Confirming nursing Life orientation • older, p=.030 Example: Nursing orientation Vanhanen-Nuutinen et al. (manuscript)
Structural Equation Modeling • References • Bentler & Stein. Structural equation models in medical research. Stat Methods Med Res 1: 159–181, 1992. • Bollen. Structural equations with latent variables. John Wiley & Sons, Inc, New York, 1989. • Finch & West. The investigation of personality structure: statistical models. J Res Pers 31: 439–485, 1997. • MacCallum & Austin. Applications of structural equation modeling in psychological research. Annu Rev Psychol 51: 201–226, 2000.
Latent class analysis • Specific statistical method developed to group subjects according to selected characteristics • Classifies subjects to groups • Identifies characteristics that indicate groups
Example: Anti-Social Behavior • National Longitudinal Survey of Youth (NLSY) • Respondent ages between 16 and 23 • Background information: age, gender and ethnicity • N=7,326 17 antisocial dichotomously scored behavior items: • Damaged property • Fighting • Shoplifting • Stole <$50 • Stole >$50 • Use of force • Seriously threaten • Intent to injure • Use Marijuana • Use other drug • Sold Marijuana • Sold hard drugs • ‘Con’ somebody • Stole an Automobile • Broken into a building • Held stolen goods • Gambling Operation Reference: http://www.ats.ucla.edu/stat/mplus/seminars/lca/default.htm
Example: Anti-Social Behavior Damage Property Fighting Shoplifting Stole <$50 Gambling . . . Male C Race Age
Relationship between class probabilities and age by gender Females Males 16 17 18 19 20 21 22 23 (age) 16 17 18 19 20 21 22 23
Example: Anti-Social Behavior • Summary of four classes: • Property Offense Class (9.8%) • Substance Involvement Class (18.3%) • Person Offenses Class (27.9%) • Normative Class (44.1%) • Classification Table: Columns: Latent class Rows: Average latent class probability for most likely latent class membership
Latent class analysis • References • Muthén & Muthén. Integrating person-centered and variable-centered analyses: Growth mixture modeling with latent trajectory classes. Alcohol Clin Exp Res, 24, 882-91, 2000. • http://www.ats.ucla.edu/stat/mplus/seminars/lca/default.htm • More references and examples • Homepage of Mplus software: www.statmodel.com
Missing data • Major problem in longitudinal studies • Usually data is not missing at random • One “solution” • Compare included and excluded cases • Not very good! • Smaller sample size give less power (change to get low p-values)
Imputing single missing data • With mean of sample (or subsample) • Gives less variability to data • Nearest neighbour imputation • Gives less variability to data • Use regression techniques to predict missing data • Mean of variables of same subject measuring appr. same thing • e.g. in psychological scales • Now “missing value analysis” also in SPSS
Multiple imputation • Requires special software • SAS/STAT (PROC MI & PROC MIANALYZE) • S-PLUS (MICE) • SOLAS for Missing Data Analysis 3.0 • References • Kmetic et al. Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis. Epidemiology, 13, 437-44, 2002. • McCleary. Using multiple imputation for analysis of incomplete data in clinical research. Nurs Res, 51, 339-43, 2002. • Streiner. The case of the missing data: methods of dealing with dropouts and other research vagaries. Can J Psychiatry, 47, 68-75, 2002.
General references in Finnish • Metsämuuronen. Tutkimuksen tekemisen perusteet ihmistieteissä (2003) • Nummenmaa et al. Tutkimusaineiston analyysi (1997) • Uhari & Nieminen. Epidemiologia & Biostatistiikka (2001) • SPSS, SAS, etc. manuals