1 / 37

Statistical methods in longitudinal studies

Statistical methods in longitudinal studies. Jouko Miettunen, PhD Department of Psychiatry University of Oulu e-mail: jouko.miettunen@oulu.fi. Topics of this presentation. Logistic regression analysis Survival analysis Analysis of variance Random regression analysis

tate
Download Presentation

Statistical methods in longitudinal studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical methods in longitudinal studies Jouko Miettunen, PhD Department of Psychiatry University of Oulu e-mail: jouko.miettunen@oulu.fi

  2. Topics of this presentation • Logistic regression analysis • Survival analysis • Analysis of variance • Random regression analysis • Structural equation modeling • Latent class analysis • Imputing missing data

  3. Logistic regression analysis (1) • Most common modeling method to analyze confounders in epidemiology, especially in longitudinal studies • Outcome variable should be dichotomized (no/yes, healthy/sick) • Exposure variables can be both dichotomized or continuous

  4. Variables in logistic regression • Include sociodemographic variables, e.g. sex, social class • Include previous known risk factors • Especially if statistically significant in the model • Do not include too many variables • Depends on data size and distribution of variables • Do not include intercorrelating variables

  5. Example data set Northern Finland 1966 Birth Cohort • Women who were living in the provinces of Oulu and Lapland and were due to deliver during 1966 • N = 12,058 live births • N = 10,934 living 1997 in Finland • Data on biological, socio-economic and health conditionscollected prospectively from pregnancy up to the age of 35 years • Data from several registers and e.g. from large follow-ups at 14 and 31 years

  6. Example question Northern Finland 1966 Birth Cohort • What predicts rehospitalization in psychoses? • N = 158 hospital treated cases • Exposure variables • sex • father’s social class (1980) • familial risk • onset age • length of first hospitalization • diagnosis (schz / other psychosis)

  7. Categorical Variables Codings Parameter coding Frequency (1) (2) Fathers social class 1980 I,II 48 ,000 ,000 III,IV 96 1,000 ,000 V 14 ,000 1,000 Parent has psychotic dg 1972-2000 no 133 ,000 yes 25 1,000 Sex male 93 1,000 female 65 ,000 Diagnosis schizophrenia 108 1,000 other psych 50 ,000 Length of first hospitalization < 1 month 94 1,000 > 1 month 64 ,000 SPSS Output (1)

  8. Variables in the Equation 95,0% C.I.for EXP(B) B S.E. Wald df Sig. Exp(B) Lower Upper 1,048 ,375 7,805 1 ,005 2,852 1,367 5,948 Length of 1st hospital.(1) Sex(1) -,559 ,366 2,331 1 ,127 ,572 ,279 1,172 Onset age -,047 ,043 1,199 1 ,274 ,954 ,876 1,038 Diagnosis(1) ,839 ,385 4,740 1 ,029 2,314 1,087 4,926 FSC 1980 ,651 2 ,722 FSC 1980(1) ,309 ,392 ,622 1 ,430 1,362 ,632 2,934 FSC 1980(2) ,109 ,647 ,028 1 ,866 1,115 ,314 3,960 Parental psych(1) ,612 ,513 1,423 1 ,233 1,845 ,675 5,045 Constant ,488 1,100 ,197 1 ,657 1,629 SPSS Output (2)

  9. Survival analysis (1) • Examines time between two events, e.g. • from birth to illness onset • from illness onset to death • from end of treatment to rehospitalization • Kaplan-Meier model estimates probability of events in each time point

  10. Survival analysis (2) • Required information • Event (0,1) • Time to event (days, months,…) or to censoring • Data is censored due to • End of follow-up time • Loss of contact • Or e.g. other dead than that of interest

  11. Example question Northern Finland 1966 Birth Cohort • What predicts age of suicide? • People alive and living in Finland at 16 years (N=10,934) • Data till end of 2001 • 58 (0.5%) suicides • 140 (1.3%) other deaths • 10,736 (98.2%) alive • Predictor variable: • family type at birth (full, single)

  12. SPSS Output (1) Test Statistics for Equality of Survival Distributions log rank test, p=0.002

  13. Survival analysis (3) • Difference or trend in difference between groups should be about the same across time, at least curves shouldn’t cross (if statistically tested) • Can be done also with small samples • Curve can be presented as survival or as hazard function • References, e.g. • Parmar & Machin: Survival analysis. A practical approach. John Wiley & Sons, 1995.

  14. SPSS Output (2)

  15. Example question (2) Age of suicide and family type • Possible confounding variables • sex • social class 1966 (I-II,III-IV,V) • average school mark at 14 • psychiatric diagnosis (no, yes) • crime (no, violent, non-violent) • Cox regression analysis

  16. Categorical Variable Codings Frequency (1) (2) Sex 1=male 5425 1 2=female 5222 0 Psych dg 0=healthy 10197 0 1=any dg 450 1 Fathers Social Class 1966 1=I,II 783 0 0 2=III,IV 7823 1 0 3=V 2041 0 1 Family type 1966 1=single 1975 1 2=full 8672 0 Criminality 0=no crimes 10019 0 0 1=violent 200 1 0 2=nonviolent 428 0 1 SPSS Output (3) Cox regression analysis

  17. Variables in the Equation 95,0% CI for Exp(B) B SE Wald df Sig. Exp(B) Lower Upper SEX ,812 ,340 5,720 1 ,017 2,253 1,158 4,383 PSYCH DG 2,463 ,303 66,085 1 ,000 11,740 6,483 21,260 FAM TYPE ,728 ,287 6,429 1 ,011 2,072 1,180 3,637 FSC 1966 1,514 2 ,469 FSC 1966(1) ,451 ,715 ,398 1 ,528 1,570 ,386 6,377 FSC 1966(2) ,536 ,436 1,513 1 ,219 1,710 ,727 4,018 SCHOOL MARK -,276 ,164 2,825 1 ,093 ,759 ,550 1,047 CRIMES 3,398 2 ,183 CRIMES(1) ,239 ,454 ,276 1 ,600 1,269 ,521 3,093 CRIMES(2) -1,011 ,625 2,613 1 ,106 ,364 ,107 1,239 SPSS Output (4) Cox regression analysis

  18. Analysis of variance • ANOVA • One continuous outcome (dependent) variable • MANOVA • Several continuous outcome variables • Repeated measurements ANOVA • Same measurements are made several times on each subject • ANOVA, MANOVA and rANOVA • Only categorical predictors • ANCOVA, MANCOVA, rANCOVA • Also continuous predictors

  19. Example question Difference in size of hippocampus • Northern Finland 1966 Birth Cohort • Follow-up study 1999-2001 • Schizophrenia patients (N=56) vs. healthy controls (N=104) • Repeated measurements ANCOVA • Measurements of right and left side were thought as repeated measurements

  20. Example table Schizophrenia and Comparison subjects Hippocampus volumes F Sig. Model 1 Within effect: side20.3 < 0.001 Diagnosis 1.2 0.28 Gender 6.5 0.01 Model 2 Within effect: side0.81 0.37 Covariate: brain vol. 35.0 < 0.001 Diagnosis < 0.01 0.89 Gender 0.7 0.41 Familial psychosis 1.9 0.17 Perinatal risk 0.8 0.38 Handedness 0.3 0.61 Tanskanen et al. Schizophrenia Research (in press)

  21. Random regression analysis • Random regression analysis = Random-effects (multilevel) models = … • Allow presence of missing data • Allow time-varying covariates • Allow subjects measured at different timepoints • Takes into account several levels of subjects (multilevel analysis)

  22. Random regression analysis • Available software • SAS Proc Mixed • Stata (GLLAMM) • Specific multilevel modeling software • MLWin • http://multilevel.ioe.ac.uk/index.html • HLM • http://www.ssicentral.com/hlm/hlm.htm

  23. Random regression analysis • References • Goldstein et al. Tutorial in biostatistics. Multilevel modelling of medical data. Stat Med, 21, 3291-315, 2002. • Hedeker & Mermelstein. Application of random-effects regression models in relapse research, Addiction, 91, S211-30, 1996. • Sharma et al. A longitudinal study of plasma cortisol and depressive symptomatology by random regression analysis. Biol Psychiatry 31, 304-14, 1992. • Tilling et al. A new method for predicting recovery after stroke. Stroke 32, 2867-73, 2001. • Homepage of Don Hedeker: • http://tigger.uic.edu/~hedeker/ • Homepage of Sophia Rabe-Hesketh (GLLAMM) • http://www.gllamm.org/sophia.html

  24. Structural Equation Modeling • Combination of factor analysis and regression • Continuous and discrete predictors and outcomes • Relationships among measured or latent variables

  25. Orientation to learning nursing Orientation to nursing Caring orientation + Catalytic- co-operational nursing • male, p=.002 • older, p<.0001 • no children, p=.048 + + (r=.27) • older, p=.034 + (r=.27) + (r=.47) + (r=.64) + (r=.44) • Swedish, p<.0001 • older, p<.0001 • no children, p=.036 + (r=.19) + + Controlling nursing Expertise orientation • Swedish, p<.0001 • older, p0.002 + (r=.18) + (r=.22) + (r=.11) • Finnish, p=.020 • younger, p=.0003 • sairaanhoit, p=.020 • no children, p<.0001 + + Confirming nursing Life orientation • older, p=.030 Example: Nursing orientation Vanhanen-Nuutinen et al. (manuscript)

  26. Structural Equation Modeling • References • Bentler & Stein. Structural equation models in medical research. Stat Methods Med Res 1: 159–181, 1992. • Bollen. Structural equations with latent variables. John Wiley & Sons, Inc, New York, 1989. • Finch & West. The investigation of personality structure: statistical models. J Res Pers 31: 439–485, 1997. • MacCallum & Austin. Applications of structural equation modeling in psychological research. Annu Rev Psychol 51: 201–226, 2000.

  27. Latent class analysis • Specific statistical method developed to group subjects according to selected characteristics • Classifies subjects to groups • Identifies characteristics that indicate groups

  28. Example: Anti-Social Behavior • National Longitudinal Survey of Youth (NLSY) • Respondent ages between 16 and 23 • Background information: age, gender and ethnicity • N=7,326 17 antisocial dichotomously scored behavior items: • Damaged property • Fighting • Shoplifting • Stole <$50 • Stole >$50 • Use of force • Seriously threaten • Intent to injure • Use Marijuana • Use other drug • Sold Marijuana • Sold hard drugs • ‘Con’ somebody • Stole an Automobile • Broken into a building • Held stolen goods • Gambling Operation Reference: http://www.ats.ucla.edu/stat/mplus/seminars/lca/default.htm

  29. Example: Anti-Social Behavior Damage Property Fighting Shoplifting Stole <$50 Gambling . . . Male C Race Age

  30. Example: Anti-Social Behavior probabilities

  31. Relationship between class probabilities and age by gender Females Males 16 17 18 19 20 21 22 23 (age) 16 17 18 19 20 21 22 23

  32. Example: Anti-Social Behavior • Summary of four classes: • Property Offense Class (9.8%) • Substance Involvement Class (18.3%) • Person Offenses Class (27.9%) • Normative Class (44.1%) • Classification Table: Columns: Latent class Rows: Average latent class probability for most likely latent class membership

  33. Latent class analysis • References • Muthén & Muthén. Integrating person-centered and variable-centered analyses: Growth mixture modeling with latent trajectory classes. Alcohol Clin Exp Res, 24, 882-91, 2000. • http://www.ats.ucla.edu/stat/mplus/seminars/lca/default.htm • More references and examples • Homepage of Mplus software: www.statmodel.com

  34. Missing data • Major problem in longitudinal studies • Usually data is not missing at random • One “solution” • Compare included and excluded cases • Not very good! • Smaller sample size give less power (change to get low p-values)

  35. Imputing single missing data • With mean of sample (or subsample) • Gives less variability to data • Nearest neighbour imputation • Gives less variability to data • Use regression techniques to predict missing data • Mean of variables of same subject measuring appr. same thing • e.g. in psychological scales • Now “missing value analysis” also in SPSS

  36. Multiple imputation • Requires special software • SAS/STAT (PROC MI & PROC MIANALYZE) • S-PLUS (MICE) • SOLAS for Missing Data Analysis 3.0 • References • Kmetic et al. Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis. Epidemiology, 13, 437-44, 2002. • McCleary. Using multiple imputation for analysis of incomplete data in clinical research. Nurs Res, 51, 339-43, 2002. • Streiner. The case of the missing data: methods of dealing with dropouts and other research vagaries. Can J Psychiatry, 47, 68-75, 2002.

  37. General references in Finnish • Metsämuuronen. Tutkimuksen tekemisen perusteet ihmistieteissä (2003) • Nummenmaa et al. Tutkimusaineiston analyysi (1997) • Uhari & Nieminen. Epidemiologia & Biostatistiikka (2001) • SPSS, SAS, etc. manuals

More Related