330 likes | 614 Views
Evaluating Risk Adjustment Models. Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics. Goals of Risk-Adjustment. Account for pertinent patient characteristics before making inferences about effectiveness, efficiency, or quality of care
E N D
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics
Goals of Risk-Adjustment • Account for pertinent patient characteristics before making inferences about effectiveness, efficiency, or quality of care • Minimize confounding bias due to nonrandom assignment of patients to different providers or systems of care • Confirm the importance of specific predictors
Why Risk-Adjustment? • Monitoring and comparing outcomes of care (death, readmission, adverse events, functional status, quality of life) • Monitoring and comparing utilization of services and resources (LOS, cost) • Monitoring and comparing patient satisfaction • Monitoring and comparing processes of care
How is Risk Adjustment Done • On large datasets • Uses measured differences in compared groups • Model impact of measured differences between groups on variables shown, known, or thought to be predictive of outcome so as to isolate effect of predictor variable of interest
When Risk-Adjustment May Be Inappropriate • Processes of care which virtually every patient should receive (e.g., immunizations, discharge instructions) • Adverse outcomes which virtually no patient should experience (e.g., incorrect amputation) • Nearly certain outcomes (e.g., death in a patient with prolonged CPR in the field) • Too few adverse outcomes per provider
When Risk-Adjustment May Be Unnecessary • If inclusion and exclusion criteria can adequately adjust for differences • If assignment of patients is random or quasi-random
When Risk-Adjustment May Be Impossible • If selection bias is an overwhelming problem • If outcomes are missing or unknown for a large proportion of the sample • If risk factor data (predictors) are extremely unreliable, invalid, or incomplete
Data Sources for Risk-Adjustment • Administrative data are collected primarily for a different purpose, but are commonly used for risk-adjustment • Medical records data are more difficult to use, but contain far more information • Patient surveys may complement either or both of the other sources
Advantages of Administrative Data • Universally inclusive population-based • Computerized, inexpensive to obtain and use • Uniform definitions • Ongoing data monitoring and evaluation • Diagnostic coding (ICD-9-CM) guidelines • Opportunities for linkage (vital stat, cancer)
Disadvantages of Administrative Data • Missing key information about physiologic and functional status • No control over data collection process • Quality of diagnositc coding varies across hospitals • Incentives to upcode (DRG creep), possibly avoid coding complications • Inherent limitations of ICD-9-CM
Doing Your Own Risk-Adjustment vs. Using an Existing Product • Is an existing product available or affordable? • Would an existing product meet my needs? - Developed on similar patient population - Applied previously to the same condition or procedure - Data requirements match availability - Conceptual framework is plausible and appropriate - Known validity
Conditions Favoring Use of an Existing Product • Need to study multiple diverse conditions or procedures • Limited analytic resources • Need to benchmark performance using an external norm • Need to compare performance with other providers using the same product • Focus on resource utilization, possibly mortality
A Quick Survey of Existing ProductsHospital/General Inpatient • APR-DRGs (3M) • Disease Staging (SysteMetrics/MEDSTAT) • Patient Management Categories (PRI) • RAMI/RACI/RARI (HCIA) • Atlas/MedisGroups (MediQual) • Cleveland Health Quality Choice • Public domain (MMPS, CHOP, CSRS, etc.)
A Quick Survey of Existing ProductsIntensive Care • APACHE • MPM • SAPS • PRISM
A Quick Survey of Existing ProductsOutpatient Care • Resource-Based Relative Value Scale (RBRVS) • Ambulatory Patient Groups (APGs) • Physician Care Groups (PCGs) • Ambulatory Care Groups (ACGs)
How Do Commercial Risk-Adjustment Tools Perform • Better predictor of use/death than age and sex • Better retrospectively (~30-50% of variation) than prospectively (~10-20% of variation) • Lack of agreement among measures • More than 20% of in-patients assigned very different severity scores depending on which tool was used (Iezzoni, Ann Intern Med, 1995)
Building Your Own Risk-Adjustment Model • Previous literature • Expert opinion - Generate specific hypotheses, plausible mechanisms - Translate clinically important concepts into measurable variables (e.g., cardiogenic shock) - Separate factors that could be risk for disease or complication of treatment • Data dredging (retrospective)
Empirical Testing of Risk Factors • Univariate/bivariate analyses to eliminate low frequency, insignificant, or counterintuitive factors • Test variables for linear, exponential,or threshold effects • Test for interactions
Potential Risk Factors for CABG Outcomes • Age, gender, race, ht, wt, BMI, • Ejection fraction, NY Heart Class, # of vessels • Comorbidity - hypertension, chf, copd, dm, hepatic failure, renal failure, calcified aorta • Acute treatment/complications - IABP, thrombolysis, PTCA, PTCA complication, hemodynamic instability • Past hx - previous surgery, PTCA, MI, stroke, fem-pop • Behaviors - smoking
Significant Risk Factors for Hospital Mortality for Coronary Artery Bypass Graft Surgery in New York State, 1989-1992
Significant Risk Factors for Hospital Mortality for Coronary Artery Bypass Graft Surgery in New York State, 1989-1992
Risk Factors in Large Data Sets: Can you have too much power? • Clinical vs. statistical importance • Risk of overfitting, and need for a comprehensible model, mandate data reduction • Consider forcing in clinically important predictors
Evaluating Model Quality • Linear regression (continuous outcomes) • Logistic regression (dichotomous outcomes)
Evaluating Linear Regression Models • R2 is percentage of variation in outcomes explained by the model - best for continuous dependent variables • Ranges from 0-100% • Generally more is better but biased upward by more predictors • Sometimes explaining a small amount of variation is still important
Evaluating Logistic Models • c statistic - compares all random pairs of individuals in each outcome group (alive vs dead) to see if risk adjustment model predicts a higher likelihood of death for those who died • Ranges from 0-1 • c value of 0.5 means model is no better than random • c value of 1.0 indicates perfect performance
How well model predicts outcomes across range of risks - Hosmer-Lemeshow • Stratify individuals into groups (e.g. 10 groups) of equal size according to predicted likelihood of adverse outcome (eg death) • Compare actual vs predicted deaths for each stratum • Hosmer-Lemeshow chi-square statistic (8 degrees of freedom for 10 deciles) • Trying to demonstrate a non significant p value
Actual and Expected Mortality Rates for Different Levels of Patient Severity of Illness Chi squared p=.16
Goodness-of-fit tests for AMI mortality models OSHPD: AMI Outcomes Project, 1996
Aggregating to the group level • Sum observed and predicted events • Statistical problems arise when total number of predicted events are small • Assuming chi-squared comparisons of groups testing minimum of five expected events per group as a rule of thumb
Comparing observed and expected outcomes • Observed events or rates of events • Expected events or rates of events • Risk adjusted events or rates = site specific(observed/expected) X average observed across all sites
Validating Model • Face validity/Content validity • Gold standard = external validation with new data • Separate development and validation data sets - Randomly split samples - Samples from different time periods/areas • Re-estimate model using all available data
Bootstrap Procedure: “If things had been a little different” • Multiple (e.g. 1000) random samples derived from original sample with replacement • Estimate model’s performance in each new random sample • Can derive C.I.s of model coefficients from empirical results of “new samples”
Consistency in the evidence • Similar findings over time help to rule out random effects • Differences between observed and expected may be due to things other than ‘quality’ • Confirmation through very different types of evidence is a major goal • View the risk adjusted estimates as ‘yellow flags’, not ‘smoking guns’