390 likes | 571 Views
How to use propensity scores in the analysis of nonrandomized designs. Patrick G. Arbogast Department of Biostatistics Vanderbilt University Medical Center. Motivation.
E N D
How to use propensity scores in the analysis of nonrandomized designs Patrick G. Arbogast Department of Biostatistics Vanderbilt University Medical Center GCRC Research-Skills Workshop
Motivation • Randomized clinical trials: randomization guarantees that on avg no systematic differences in observed/unobserved covariates. • Observational studies: no control over tx assignments, and E+/E- groups may have large differences in observed covariates. • Can adjust for this via study design (matching) or during estimation of tx effect (stratification/regression). GCRC Research-Skills Workshop
Analysis limitations • <10 events/variable (EPV), estimated reg coeff’s may be biased & SE’s may be incorrect (Peduzzi et al, 1996). • Simulation study for logistic reg. • Harrell et al (1985) also advocates min no. of EPV. • A solution:propensity scores (Rosenbaum & Rubin, 1983). • Likelihood that patient receives E+ given risk factors. GCRC Research-Skills Workshop
Intuition • Covariate is confounder only if its distribution in E+/E- differ. • Consider 1-factor matching: low-dose aspirin & mortality. • Age, a strong confounder, can be controlled by matching. • Can extend to many risk factors, but becomes cumbersome. • Propensity scores provide a summary measure to control for multiple confounders simultaneously. GCRC Research-Skills Workshop
Propensity score estimation • Identify potential confounders. • Current conventional wisdom: if uncertain whether covariate is confounder, include it. • Model E+ (typically dichotomous) as function of covariates using entire cohort. • E+ is outcome for propensity score estimation. • Do not include D+. • Logistic reg typically used. • Propensity score = estimated Pr(E+|covariates). GCRC Research-Skills Workshop
Counterintuitive? • Natural question: why estimate probability that a patient receives E+ since we already know exposure status? • Answer: adjusting observed E+ with probability of E+ (“propensity”) creates a “quasi-randomized” experiment. • For E+ & E- patients with same propensity score, can imagine they were “randomly” assigned to each group. • Subjects in E+/E- groups with equal (or nearly equal) propensity scores tend to have similar distribution in covariates used to estimate propensity. GCRC Research-Skills Workshop
Balancing score • For given propensity score, one gets unbiased estimates of avg E+ effect. • Can include large no. of covariates for propensity score estimation. • In fact, original paper applied propensity score methodology to observational study comparing CABG to medical tx, adjusting for 74 covariates in propensity model. GCRC Research-Skills Workshop
Applications • Matching. • Regression adjustment/stratification. • Weighting. GCRC Research-Skills Workshop
Propensity score matching • Match on single summary measure. • Useful for studies with limited no. of E+ patients and a larger (usually much larger) no. of E- patients & need to collect add’l measures (eg, blood samples). GCRC Research-Skills Workshop
Matching techniques • Nearest available matching on estimated propensity score. • Select E+ subject. • Find E- subjecdt w/ closest propensity score. • Repeat until all E+ subjects matched. • Easiest in terms of computational considerations. • Others: • Mahalanobis metric matching. • Nearest available Mahalanobis metric matching w/ propensity score-based calipers. GCRC Research-Skills Workshop
Illustrative example • Consider an HIV database: • E+: patients receiving a new antiretroviral drug (N=500). • E-: patients not receiving the drug (N=10,000). • D+: mortality. • Need to manually measure CD4. • May be potential confounding by other HIV drugs as well as 10 prognostic factors, which are identified & stored in the database. GCRC Research-Skills Workshop
Illustrative example (2) • Option 1: • Collect blood samples from all 10,500 patients. • Costly & impractical. • Option 2: • For all patients, estimate Pr(E+|other HIV drugs & prognostic factors). • For each E+ patient, find E- patient with closest propensity score. • Continue until all E+ patients match with E- patient. • Collect blood sample from 500 propensity-matched pairs. GCRC Research-Skills Workshop
The effectiveness of right heart catheterization in the initial care of critically ill patients (Connors et al, 1996) GCRC Research-Skills Workshop
RHC: add’l background • Teaching hospitals: • Beth israel Hospital, Boston. • Duke University Medical Center, Durham. • Metro-Health Medical Center, Cleveland. • St Joseph’s Hospital, Marshfield, WI. • UCLA. • Prespecified disease categories: • Acute respiratory failure. • COPD. • CHF. • Cirrhosis. • Nontraumatic coma. • Colon cancer metastatic to liver. • Non-small cell cancer of lung. • Multiorgan system failure with malignancy or sepsis. GCRC Research-Skills Workshop
RHC: differential E+/E- • Decision to use RHC left to discretion of physician. • Thus, tx selection may be confounded with patient factors related to outcome. • eg, patients with low BP may be more likely to receive RHC, & such patients may also be more likely to die. GCRC Research-Skills Workshop
RHC: propensity score estimation • Panel of 7 specialists in critical care specified variables related to decision to use RHC. • Cpt propensity score, Pr(RHC|covariates), via logistic regression. • Covariates: • age, sex, yrs of education, medical insurance, primary & secondayr disease category, admission dx, ADHL & DASI, DNR status, cancer, 2-month survival probability, acute physiology component of APACHE III score, Glasgow Coma Score, wt, temparature, BP, respiratory rate, heart rate, PaO2/FiO2, PaCO2, pH, WBC count, hematocrit, sodium, potassium, creatinine, bilirubin, albumin, urine output, comorbid illnesses. GCRC Research-Skills Workshop
RHC: propensity score assessment • Adequacy of propensity score to adjust for effects of covariates assessed by testing for differences in individual covariates between RHC+/RHC- patients after stratifying by PS quintiles. • Model each covariate as function of RHC & PS quintiles. • Covariates balanced if not related to RHC after PS adjustment. GCRC Research-Skills Workshop
RHC: propensity score matching • For each RHC+, RHC- w/ same disease category & closest PS (+/- 0.03) identified. • Continued until all pairs identified. • PS difference for each pair calculated. Each pair w/ positive difference matched with pair w/ negative difference closest in magnitude. • Assure equal no.’s of pairs w/ positive & negative PS differences. • Final matched set: 1008 matched pairs. GCRC Research-Skills Workshop
RHC: PS-matched analysis of RHC & survival GCRC Research-Skills Workshop
RHC: PS-matched analysis of RHC & resource use * Mean (25th, 50th, 75th %-tiles); ** Therapeutic Intervention Scoring System. GCRC Research-Skills Workshop
Regression adjustment/stratification • Stratification on PS alone can balance distributions of covariates in E+/E- groups w/o exponential increase in no. of strata. • Rosenbaum & Rubin (1983) showed that perfect stratification based on PS will produce strata where avg tx effect w/i strata is unbiased estimate of true tx effect. GCRC Research-Skills Workshop
RHC: regression adjustment • Full cohort: N=5735. • PH regression: • Adjusted for PS, age, sex, no. of comorbid illnesses, ADL & DASI 2 wks prior to admission, 2-month prognosis, day 1 Acute Physiology Score, Glasgow Coma Score, & disease category. • Question: why include covariates in main model in addition to PS (especially covariates already used to estimate PS)? GCRC Research-Skills Workshop
RHC: 30-day survival, entire cohort ARF – acute respiratory failure, MOSF – multiorgan system failure. GCRC Research-Skills Workshop
RHC: resource utilization GCRC Research-Skills Workshop
Propensity score weighted regression adjustment • Weight patient’s contribution to reg model. • Inverse-probability-of-tx-weighted (IPTW) estimator (Robins et al, 2000): • Estimates tx effect in pop whose distribution of risk factors equals that found in all study subjects. • Wts: 1/PS(X) for E+ & 1/(1-PS(X)) for E-. • Standardized mortality ratio (SMR)-weighted estimator (Sato et al, 2003): • Estimates tx effect in pop whose distribution of risk factors equals that found in E+ subjects only. • Wts: 1 for E+ & PS(X)/(1-PS(X)) for E-. GCRC Research-Skills Workshop
Comparison of propensity score methods • Example: tissue plasminogen activator (t-PA) in 6269 ischemic stroke patients (Kurth et al, 2005): • Multivariable logistic reg. • Logistic reg after matching on PS +/- 0.05 • Logistic reg adjusting for PS (linear term & deciles). • IPTW. • SMR. GCRC Research-Skills Workshop
Propensity score distribution by t-PA+/t-PA- GCRC Research-Skills Workshop
Propensity analysis results GCRC Research-Skills Workshop
Propensity analyses restricting to PS 0.05+ GCRC Research-Skills Workshop
Propensity score vs other methods • Matching on individual factors: • Too cumbersome (eg, matching on 10 factors, each having 4 categories, resulting in ~1,000,000 combinations of patient characteristics). • Stratified analyses: same problem. • Regression (Cepeda et al, 2003): • <7 events/confounder – PS less biased, more robust, & more precise. • 8+ events/confounder – multiple reg preferable: • Bias from multiple reg goes away, but still present for PS analysis (eg, ~25-30% bias when OR=2.0). • Coverage probability (% of 95% CI’s containing true OR) decreases for PS analysis. GCRC Research-Skills Workshop
Benefits: • Useful when adjusting for large no. of risk factors & small no. of EPV. • Useful for matched designs (saving time & money). • Can be applied to exposure with 3+ levels (Rosenbaum, 2002). GCRC Research-Skills Workshop
Limitations • Can only adjust for observed covariates. • Propensity score methods work better in larger samples to attain distributional balance of observed covariates. • In small studies, imbalances may be unavoidable. • Including irrelevant covariates in propensity model may reduce efficiency. • Bias may occur. • Non-uniform tx effect. GCRC Research-Skills Workshop
Sample propensity analysis: RHC • E+: RHC use. • swang1 (0=RHC-, 1=RHC+) • D+: time-to-death, min(obs time, 30d). • Events after 30d censored. • RHC could not have a long-term effect. • Such ill patients more affected by later tx decisions. • t3d30, censor var=censor • N=5735 patients, N=1918 deaths w/i 30d. • 38.0% RHC+ & 30.6% RHC- died w/i 30d. GCRC Research-Skills Workshop
Kaplan-Meier plot by RHC status GCRC Research-Skills Workshop
Propensity model • Logistic reg: RHC+/- dependent var. • Adjusts for 50 risk factors. • Propensity score distribution by RHC groups: GCRC Research-Skills Workshop
Confounders related to RHC after propensity score (quintiles) adjustment (selected risk factors)? GCRC Research-Skills Workshop
RHC & survival, entire cohort GCRC Research-Skills Workshop
References • Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. Am J Epidemiol 2003; 158: 280-287. • Connors Jr AF, Speroff T, Dawson NV, et al. The effectiveness of right heart catheterization in the initial care of critically ill patients. JAMA 1996; 276: 889-897. • D’Agostino Jr, RB. Tutorial in biostatistics: propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 1998; 17: 2265-2281. • Gum PA, Thamilarasan M, Watanabe J, Blackstone EH, Lauer MS. Aspirin use and all-cause mortality among patients being evaluated for known or suspected coronary artery disease. JAMA 2001; 286: 1187-1194. • Harrell FE, Lee KL, Matchar DB, Reichart TA. Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treatment Reports 1985: 69: 1071-1077. • Kurth T, Walker AM, Glynn RJ, Chan KA, Gaziano JM, Berger K, Robins JM. Results of multivariable logistic regrssion, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol 2006; 163: 262-270. • Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996; 49: 1373-1379. • Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11: 550-560. • Rosenbaum PR. Observational Studies. New York, NY: Springer-Verlag, 2002. • Rosenbaum PR, Rubin DB. The central rol of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41-55. • Rubin DB. Estimating causal effects from large data sets using propensity scores. Annals of Internal Medicine 1997; 127: 757-763. • Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology 2003; 14: 680-686. GCRC Research-Skills Workshop