260 likes | 467 Views
Building Risk Adjustment Models. Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics. Building Your Own Risk-Adjustment Model. What population Generic or disease specific What time period Single visit/hospitalization or disease state that includes multiple observations
E N D
Building Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics
Building Your Own Risk-Adjustment Model • What population • Generic or disease specific • What time period • Single visit/hospitalization or disease state that includes multiple observations • What outcomes • Must be clearly defined and frequent enough for modeling • What purpose • Implications for how good the model needs to be
Inclusion/Exclusion: Hospital Survival for Pneumonia • Include • Primary ICD-9 code 480-487 (viral/bacterial pneumonias) • Secondary ICD-9 code 480-487 and primary of empyema (510), pleurisy (511), pneumothorax (512), lung abscess (513), or respiratory failure (518) • Exclude • Age <18 years old • Admission in prior 10 days • Other diagnoses of acute trauma • HIV, cystic fibrosis, tuberculosis, post operative pneumonia
Episode of Care • Does dataset include multiple observations (visits) over time of the same individual? • Re-hospitalizations • Hospital transfers • Can dataset support linking observations (visits) over time? • Inclusion and exclusion criteria should describe handling of multiple observations
Inclusion/Exclusion: Hospital Survival for Acute MI • Include • Primary ICD-9 code 410 (AMI) • Secondary ICD-9 code 410 and primary of vent arrhythmia (427), cardiac rupture (429), pulmonary edema (518), syncope (780), or shock(785) • Exclude • Age <18 years old • Length of stay <48 hours • AMI admissions with previous AMI admission in last month
Risk Factors for Outcomes • While there may be some in common, risk factors for an outcome given a health condition are not the same as the risk factors for the condition. • Hyperlipidemia is a risk factor for MI but not for survival following an MI
Identifying Risk Factors for Model • Previous literature • Expert opinion • Data dredging (retrospective)
Reliability of Measurement • Is the ascertainment and recording of the variable standardized within and across sites? • Are there audits of the data quality and attempts to correct errors?
Missing Data • Amount • Why is it missing? Biased ascertainment? • Does missing indicate normal or some other value? • Can missing data be minimized by inclusion/exclusion criteria? • May want to impute missing values
Risk Factors: Which Value With Multiple Measurements? • First? Worst? Last? • Consider whether timing of data collection of risk factor accurately reflects relevant health state, could confound rating of quality or number of missing values • May be able to improve estimate of some risk factors using multiple measures
Gaming • Situation in which the coding of risk factors is influenced by coder’s knowledge or assumptions regarding how the data will be used to create a performance report or to calculate payment • The potential for gaming to alter the results (eg quality comparisons of providers) is related to the degree that it occurs similarly or differently across providers
Co-Morbidity or Complication • May be difficult to determine whether a condition is a co-morbidity or a complication • Shock following an MI • Including complications in risk adjustment models gives providers credit for poor quality care • True co-morbidities may be dropped from risk adjustment models out of concern that they sometimes represent complications
Strategies to Separate Co-morbidities from Complications • Generally not an issue for chronic diseases • Link to earlier records (eg previous admissions) can be helpful • Condition present at admission (CPAA) coding now a standard part of California hospital discharge data
Risk Factors: Patient Characteristics Not Process of Care • Processes of care can be indicative of severity • However treatments also reflect practice style/quality • Process measures can be explored as a possible mediators as opposed to risk factors for outcomes
Coronary Artery Disease: Mortality Rates by Race Age, coronary anatomy, ejection fraction, chf, angina, AMI, mitral regurgitation, periph vasc disease, coexisting illnesses: Peterson et al, NEJM, 1997
Empirical Testing of Risk Factors • Univariate analyses to perform range checks, eliminate invalid values and low frequency factors • Bivariate analyses to identify insignificant or counterintuitive factors • Test variables for linear, exponential, u-shaped, or threshold effects • Test for interactions
Building Multivariate Models • Start with conceptual framework from literature and expert opinion • Stepwise addition (or subtraction) • Pre-specify statistical significance for retaining variables • Limit number of predictor variables - maximum of 1 per every 10-30 outcomes of interest
CAGB Registry in NY State:Significant Risk Factors for Hospital Mortality for Coronary Artery Bypass Graft Surgery 1989-1992
Significant Risk Factors for Hospital Mortality for Coronary Artery Bypass Graft Surgery in New York State, 1989-1992
Risk Factors in Large Data Sets: Can you have too much power? • Large datasets prone to finding statistical significance • May want to consider whether statistical significance is clinically significant • Conversely, consider forcing in clinically important predictors even if not statistically significant
Counterintuitive findings • Hypertension is protective - decreased risk of mortality • Perhaps a surrogate for patients on beta blockers • If don’t believe hypertension truly protective then best to drop from model
Smaller Models are Preferred • More comprehensible • Less risk of “overfitting” the data
Overfitting Data: Overspecified Model • Model performs much better in fitted data set than validation data set • May be due to • Infrequent predictors • Unreliable predictors • Including variables that do not meet pre-specified statistical significance
Internal Validation • Take advantage of the large size of administrative datasets • Establish development and validation data sets - Randomly split samples • Samples from different time periods/areas • Determine stability of model’s predicting power • Re-estimate model using all available data
Summary: Risk Adjustment Using Secondary Data • Requires large datasets • Risk factors are patient characteristics that predict outcomes, not process of care and not complications • Multivariate model building should be guided by literature/expert opinion • The smallest model that performs well is generally best • Next time we will evaluate model performance