530 likes | 1.16k Views
Group-Based Trajectories for the analysis of observational cohorts. Michael W. Plankey, PhD Professor Georgetown University Medical Center Washington, DC, USA. Multicenter AIDS Cohort Study (MACS).
E N D
Group-Based Trajectories for the analysis of observational cohorts Michael W. Plankey, PhD Professor Georgetown University Medical Center Washington, DC, USA
Multicenter AIDS Cohort Study (MACS) 7,087 Men who have sex with Men (MSM) enrolled in 4 United States sites (Baltimore, Maryland; Chicago, Illinois; Los Angeles, California; and Pittsburgh, Pennsylvania): • 4,954 in 1984-85 • 668 in 1987-90 • 1,350 in 2001-03 • 115 in 2010+ Study visits every 6 months: • Standardized interviews • Physical examination • Quality-controlled flow cytometry • HIV RNA quantification Storage of bio-specimens in local/national repositories Continuous Outcome Ascertainment Seroconversion Clinical Outcomes (medical records confirmation) AIDS diagnoses Non-AIDS diagnoses Cardiovascular disease Cerebrovascular disease Kidney disease Liver disease Lung infection, bacteremia, septicemia Malignancies Neurologic Mortality Source: www.aidscohortstudy.org
Multicenter AIDS Cohort Study (MACS) MACS Active Cohort* (n=2,185) • Median Age (years): 59.2 HIV Serostatus: • Seronegative – 1,016 (46%) • Seropositive – 1,169 (54%) Education: • Less than high school – 96 (4%) • High School/Some College - 808 (37%) • College or higher– 1,278 (59%) HIV Seropositive only Median CD4 Cell Count (cells/μl): 681 Median CD8 Cell Count (cells/μl): 773 Median HIV RNA (copies/ml): <20 Race: • White, non-Hispanic – 1,356 (62%) • Black, non-Hispanic – 505 (23%) • Other, non-Hispanic – 39 (2%) • Hispanic – 285 (13%) • * As of September 2018 Source: www.aidscohortstudy.org
Group-based Trajectory Analysis What is a trajectory? It is “the evolution of an outcome over age or time.” What is group-based trajectory analysis? The goal of group-based trajectory analysis is to identify clusters of individuals following similar patterns of a behavior or outcome over age or time. Source: Nagin 2005. Group-based Modeling of Development, Harvard University Press Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571.
Group-based Trajectory Analysis vs Other Analyses of Change Other analyses of change can include calculating aggregate prevalence over time, hierarchical modeling and latent curve analysis. These approaches, however, only estimate population averages. In contrast, group-based trajectory modeling assumes the population is composed of distinct groups with different underlying trajectories.
Prevalence of Marijuana Use in the Multicenter AIDS Cohort Study Okafor et al examined long-term trends in the prevalence and correlates of current and daily marijuana use among 2,742 HIV+ and 3,172 HIV- participants in the MACS from 1983-2013. Yearly prevalence of current and daily marijuana use was calculated and plotted over the follow-up period. HIV status was associated with higher prevalence of marijuana use. Okafor, et al; Prevalence and correlates of marijuana use among HIV-seropositive and seronegative men in the Multicenter AIDS Cohort Study (MACS), 1984-2013. Am J Drug Alcohol Abuse. 2017 Sep; 43(5): 556–566
Trajectories of Marijuana Use in the Multicenter AIDS Cohort Study • Okafor et al. then constructed longitudinal trajectories of marijuana use in 3,658 HIV+/HIV- participants in the MACS using group-based trajectory analysis. • Four distinct trajectory groups were • identified: • Abstainer/infrequent (65%) • Decreaser (13%) • Increaser (12%) • Chronic high (10%) Being HIV+ was associated with increased odds of membership in decreaser, increaser and chronic high groups. Okafor, et al; Trajectories of Marijuana Use among HIV-seropositive and HIV-Seronegative MSM in the Multicenter AIDS Cohort Study (MACS), 1984–2013. AIDS Behav. 2017 Apr; 21(4): 1091–1104.
Group-based Trajectory AnalysisConceptual Model Time dependent covariates influences the “shape of trajectory”. i.e. They can be used to analyze the effect of time-varying events on trajectory Time stable covariates act as predictors of trajectory group membership Source: Jones, Bobby L., Daniel S. Nagin, and Kathryn Roeder. "A SAS procedure based on mixture models for estimating developmental trajectories." Sociological methods & research29.3 (2001): 374-393.
Group-based Trajectory Modeling: Brief Statistical Overview The group membership probabilities, πj , j=1, . . . , J, are estimated by a multinomial logit function where ϴj is normalized to have a mean of zero. That way, the estimation of πj each probability will fall between 0 and 1. Source: Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571.
Group-based Trajectory Modeling: Brief Statistical Overview The group-based trajectory model assumes that the population is composed of J different underlying trajectory groups. P(Yi) = Probability of Yi Yi= trajectory data for subject i πj = probability of group j Pj(Yi) = probability of belonging to a group j Source: Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571.
Group-based Trajectory ModelingMissing Values Group-based trajectory models assumes values are missing at random. Therefore, subjects with some missing longitudinal data values or time-dependent covariate values are still included in the analysis. Source: Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571.
Distributions Handled Censored normal model Poisson-based model Logit-based model
Censored Normal Model The censored normal (CNORM) model is useful for modeling the distribution of psychometric scale data. It also appropriate for continuous data that are approximately normally distributed, with or without censoring. Source: Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571.
Zero-inflated Poisson Model The zero-inflated Poisson (ZIP) model is useful for modeling the distribution of count data when there are more zeros than under the Poisson assumption. Ex: Number of Convictions Source: Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571.
Logit Model The logistic (LOGIT) model is used to model the distribution of dichotomous data. Ex: Smoking/Non-smoking Source: Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571.
Proc Traj SAS procedure that identifies clusters of individuals following similar progressions of an outcome over time or age by fitting a group based model. Proc Traj is used like any SAS procedure. Traj download, online documentation, examples, and installation information at http://www.andrew.cmu.edu/~bjones Source: Jones, Bobby L. “Proc Traj A SAS Procedure for Group Based Modeling of Longitudinal Data”. Presentation by Bobby L. Jones, Carnegie Mellon University
Proc Traj • Easy-to-use, PC SAS procedure • Handles missing data • As previously mentioned, subjects with some missing longitudinal data are still included in the model. • Handles sample weights • You can incorporate sampling weights into the model • Allows irregular spacing of measurements Source: Jones, Bobby L. “Proc Traj A SAS Procedure for Group Based Modeling of Longitudinal Data”. Presentation by Bobby L. Jones, Carnegie Mellon University
Basic Proc Traj Model Estimates • Number of groups • Group trajectory shapes • Group sizes • Individual probabilities of group membership Source: Jones, Bobby L. “Proc Traj A SAS Procedure for Group Based Modeling of Longitudinal Data”. Presentation by Bobby L. Jones, Carnegie Mellon University
Calculation and Use of Posterior Probability of Group Membership The posterior probability of membership in group j for subject i is based on application of Bayes’ theorem. Maximum‐probability assignment rule - Subjects are assigned to the group for which their probability of membership is highest. Proc Traj computes the posterior probabilities of group membership and assign group memberships. Source: Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571.
Model Building Overview Model building with Proc Traj is an iterative process and requires a priori decisions based on one’s own knowledge of the data. In the most basic process, the following steps should be followed: • Decide on the maximum number of groups using a priori knowledge. • Fit number of groups to data (start by fitting a one group model, and then fit up to the maximum logical number of groups in a step wise manner) – Comparing Bayesian information criterion, group size and entropy at each step. • Select the shape of the trajectory for each group (e.g. flat/intercept, linear, quadratic, and cubic): • Start with quadratic shape. • Adjust the shape up or down based on the statistical significance Source: Victoria Arrandale, MiekeKoehoorn, Ying MacNab, Susan M. Kennedy. “How to use SAS Proc Traj and SAS Proc Glimmix in Respiratory Epidemiology” . December 2006
Model Selection We use the Bayesian information criterion (BIC) to aid in model selection. We prefer models with the lowest BIC when compared to the model with k -1 less groups. You can also use the change in BIC between two models as a measure of the evidence against the k -1 group (H0). ∆BIC = BIC(K groups) - BIC(k-1 groups) The Akaike information criterion (AIC) estimates the relative quality of statistical models and can also help in model selection. The BIC generally penalizes models with more parameters (or groups in this case) compared to the AIC.
Model Selection We also calculate entropy, i.e. how well a subject is assigned to a group. It is the mean group membership probability within given group. We generally prefer models with entropies of 0.7 or higher. Group sizes of 10% or higher (depending on the sample size) are preferred. Use a priori knowledge of data to help make an informed decision. i.e. “What makes sense?”
Utilizing a prior knowledge in model selection In a population of HIV+ individuals, you want identify patterns of viremia over time. You suspect that there will a group that always have an undetectable viral load, one group that always have a detectable viral load and a few that may fall in between. You wouldn’t try to fit up to 10 trajectories groups, when logically, there may only be 3 or 4 in your data. Source: Ocampo et al. "Trajectory analyses of virologic outcomes reflecting community-based HIV treatment in Washington DC 1994–2012." BMC public health 15.1 (2015): 1277.
Post-hoc Analyses Post-hoc analyses can be performed on group assignments • Ex 1: Assess characteristics associated with group membership in multinomial models. • Ex 2: Use group assignments as predictors of a distal outcome.
PROC TRAJ Syntax PROC TRAJstatement-options; ID; /*Variable that contains information to identify subjects*/ VAR; /*Outcome variables measured at different times*/ INDEP; /*Independent time variable when VAR variables were measured*/ NGROUPS; /* Specify number of groups*/ MODEL; /*Outcome Variable distribution (CNORM, ZIP, LOGIT)*/ ORDER; /*Specifies shape of trajectory for each group 0=flat/intercept, 1=linear, 2=quadratic, 3=cubic)*/ WEIGHT; /*Specify Weight variable*/RUN; %TRAJPLOT(OP,OS, “TITLE”, “Y axis”, “X axis”) /*Graph of group trajectories*/ For information on additional options in PROC TRAJ, please visit : https://www.andrew.cmu.edu/user/bjones/documentation.htm
Proc Traj ExamplePolypharmacy Background: Polypharmacy is generally defined as the concurrent use of 5 or more medications. Objective: We investigated the patterns of polypharmacy from 2004-2016 Population: 3,160HIV-positive and negative participants in the Multicenter AIDS Cohort Study Methods: Group-based trajectory analysis using PROC TRAJ Ware et al. Examination of polypharmacy trajectories among HIV-positive and -negative men in an ongoing longitudinal cohort from 2004 to 2016. Submitted for Review.
Data Organization *Abbreviated Data file Dataset must be organized in a “wide” format as shown. Time-stable variables should have one column (race). Time-dependent variables (outcome, visit) should have multiple columns corresponding with each time point.
Proc TRAJ SyntaxPolypharmacy Statement Options: data=“Input dataset” out = “Group assignments and membership probabilities” outstat= “Parameter Estimate used by plot” outplot=“Trajectory plot data” proctrajdata=poly_data out=out outstat=os outplot=op; var polypharm_40-polypharm_65; /*Outcome Variables*/ indep visit_40-visit_65 ; /*Time Variables*/ modellogit; /*Logit Model*/ ngroups 4; /*Number of Groups*/ order 133 3; /*Linear & Cubic shapes for four groups*/ id macsid; /*Specifying IDs*/ run; Ware et al. Examination of polypharmacy trajectories among HIV-positive and -negative men in an ongoing longitudinal cohort from 2004 to 2016. Submitted for Review.
Prob > |T| is the p-value of the parameter. It refers to the statistical significant of the trajectory shape (flat/intercept, linear, quadratic, and cubic) We look for p-values that are less than 0.05 Parameters refer to the shape of the trajectory for each group (1-4). Intercept (Flat), Linear, Quadratic and Cubic are specified in ORDER statement in the syntax. Proc Traj OutputPolypharmacy Generally, we begin with quadratic shapes in the ORDER statement. We adjust the parameters up or down based on the statistical significance until all parameters are significant. Group sizes (presented as percentages) for each group BIC, AIC and likelihood values are reported.. Ware et al. Examination of polypharmacy trajectories among HIV-positive and -negative men in an ongoing longitudinal cohort from 2004 to 2016. Submitted for Review.
Proc Traj GraphPolypharmacy %trajplot(OP, OS, "Prob of Polypharm","Visit"); Ware et al. Examination of polypharmacy trajectories among HIV-positive and -negative men in an ongoing longitudinal cohort from 2004 to 2016. Submitted for Review.
Model Selection DetailsPolypharmacy Trajectory models were generated for 1, 2, 3, 4, 5, and 6 groups with BIC, group size, and entropy recorded at each step. After the optimal number of groups was established, we determined the appropriate shape of trajectory for each group. Trajectory shapes were adjusted using higher- or lower-ordered parameters (flat/intercept, linear, quadratic, or cubic), depending on the significance of those terms. Final Polypharmacy model: 4 groups –shapes:1 linear, 3 cubic BIC: -27,497.70 (lowest) Group sizes: 12.8% to 46.0% Entropy values: 0.85 to 0.90 Ware et al. Examination of polypharmacy trajectories among HIV-positive and -negative men in an ongoing longitudinal cohort from 2004 to 2016. Submitted for Review.
Proc Traj Results/Post-Hoc AnalysesPolypharmacy Four distinct groups of polypharmacy emerged over time among all participants: • Non-polypharmacy (46.0%) • Slowly increasing polypharmacy (25.9%) • Rapidly increasing polypharmacy (12.8%) • Sustained polypharmacy (15.2%) Post-hoc analyses revealed that being HIV-positive, aged 50 or older, having medication insurance and increased health care use was positively associated with membership in the sustained polypharmacy group. Ware et al. Examination of polypharmacy trajectories among HIV-positive and -negative men in an ongoing longitudinal cohort from 2004 to 2016. Submitted for Review.
Summary • Group-based trajectory analysis identifies clusters of individuals following similar patterns of an outcome over time. • Proc Traj is a powerful SAS procedure to generate group-based trajectories and is available for download on the developer’s website along with examples and documentation. • Can handle normal, Poisson and logit-based models. • Maximum probability assignment rule is used to assign subjects to groups. • Post-hoc analysis can be used to test association of group membership by subject characteristics or group assignments can be used as predictors in distal outcomes.
Acknowledgements Data in this presentation were collected by the Multicenter AIDS Cohort Study (MACS) with centers at Baltimore (U01-AI35042): The Johns Hopkins University Bloomberg School of Public Health, Chicago (U01-AI35039): Feinberg School of Medicine, Northwestern University, and Cook County Bureau of Health Services,Los Angeles (U01-AI35040): University of California, UCLA Schools of Public Health and Medicine, Pittsburgh(U01-AI35041),and Data Coordinating Center (UM1-AI35043): The Johns Hopkins University Bloomberg School of Public Health. The MACS is funded primarily by the National Institute of Allergy and Infectious Diseases (NIAID), with additional co-funding from the National Cancer Institute (NCI), the National Institute on Drug Abuse (NIDA), and the National Institute of Mental Health (NIMH). Targeted supplemental funding for specific projects was also provided by the National Heart, Lung, and Blood Institute (NHLBI), and the National Institute on Deafness and Communication Disorders (NIDCD). MACS data collection is also supported by UL1-TR001079 (JHU ICTR) from the National Center for Advancing Translational Sciences (NCATS) a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. The contents of this publication are solely the responsibility of the authors and do not represent the official views of the National Institutes of Health (NIH), Johns Hopkins ICTR, or NCATS. The MACS website is located at http://aidscohortstudy.org/.
Resources Multicenter AIDS Cohort Study website: http://aidscohortstudy.org/ Proc Traj website: https://www.andrew.cmu.edu/user/bjones/example.htm Mentioned Articles: Marijuana Prevalence– https://www.ncbi.nlm.nih.gov/pubmed/27808576 Marijuana Trajectory – https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5136352/
References • Nagin, 2005. Group-based Modeling of Development, Harvard University Press • Jones, Bobby L., and Daniel S. Nagin. "Advances in group-based trajectory modeling and an SAS procedure for estimating them." Sociological methods & research 35.4 (2007): 542-571. • Jones, Bobby L., Daniel S. Nagin, and Kathryn Roeder. "A SAS procedure based on mixture models for estimating developmental trajectories." Sociological methods & research29.3 (2001): 374-393. • Jones, Bobby L. “Proc Traj A SAS Procedure for Group Based Modeling of Longitudinal Data”. Presentation by Bobby L. Jones, Carnegie Mellon University • Arrandale, Victoria; Koehoorn , Mieke; MacNab , Ying, Kennedy , Susan M.. “How to use SAS Proc Traj and SAS Proc Glimmix in Respiratory Epidemiology”. December 2006 • Ware, et al; Examination of polypharmacy trajectories among HIV-positive and –negative men in an ongoing longitudinal cohort from 2004 to 2016. Submitted for Review. • Okafor, et al; Trajectories of Marijuana Use among HIV-seropositive and HIV-Seronegative MSM in the Multicenter AIDS Cohort Study (MACS), 1984–2013. AIDS Behav. 2017 Apr; 21(4): 1091–1104.