280 likes | 451 Views
Methods Matter: Methodological Considerations in Generating Provider Performance Scores for Use in Public Reporting . AHRQ Annual Meeting Mark Friedberg, MD, MPP and Cheryl Damberg , PhD September 27, 2010. Background.
E N D
Methods Matter: Methodological Considerations in Generating Provider Performance Scores for Use in Public Reporting AHRQ Annual Meeting Mark Friedberg, MD, MPP and Cheryl Damberg, PhD September 27, 2010
Background • Many communities are developing public reports to compare the performance of health care providers • Examples: CVEs, Aligning Forces Communities
Road to Producing Performance Reports Goals of CVE stakeholder B Goals of CVE stakeholder A Negotiation Decisions About Methods Issues Performance reports Available data Focus of RAND “Methods” White Paper
Not always a clear distinction Motivation for the Methods White Paper • Comparative ratings of providers in public score cards are influenced by earlier decisions about: • “Value judgments” • Methods • The RAND white paper identifies 23 methods decisions and options for addressing those decisions, which should to be considered when generating provider performance scores for public reporting
Format Used to Discuss Methods Decisions: Example:How to Handle the Small Numbers Problem? • Why are small numbers important to consider? • Sample size affects reliability • …which affects the risk of misclassifying a provider’s performance • Identify alternative options for addressing and discuss the advantages/disadvantages of each option • Report at a higher level of aggregation • Combine data across years • Use composite measures • Illustrate with a CVE example: • Massachusetts Health Quality Partners combined options A and C
1. Negotiating consensus on “value judgments” of performance reporting • What are the purposes of reporting? • 6. Creating performance reports • How will performance be reported? • Will composite measures be used? • 2. Determining the measures that will be used to evaluate providers • Which measures? • How will measures be specified? • 5. Computing provider scores • How will data be attributed to providers? • How will performance scores be calculated? • 3. Data sources and aggregation • What kinds of data? • How will sources be combined? • 4. Checking data quality and completeness • How will missing data be detected and handled? Decision Points Encountered in Producing a Performance Report
Today’s Agenda: Review 4 Key Methods Issues in White Paper • Risk of misclassifying provider performance • Most misclassification issues are fundamentally about "unknowable unknowns“—the information necessary to know “true” provider performance usually does not exist • Validity • Systematic misclassification due to differences in patient characteristics (adjustment or stratification) • Analogy: Validity is influenced by whether you tend to bat against better pitchers than others • Reliability • Misclassification due to chance or noise in estimate • Analogy: Reliability is influenced by how many times you've been at bat and how widely batting averages vary between players • Composite measures • Analogy: A “triple-double” summary measure in the NBA, where the player accumulates a double-digit number total in 3 of 5 statistical categories in a game (points, rebounds, assists, steals and blocked shots)
Methods Issue #1: Misclassification of Performance • All reports of provider performance classify providers by “categorizing” their performance • Providers can be classified in various ways, such as: • Relative to each other • Relative to a specified level of performance (e.g., above or below national average performance) • Provider rankings are a kind of classification system, since each rank is a class • Reports that show confidence intervals also allow the end user of the information to classify the performance of a provider • Typical comparison is whether the provider’s performance is different from the mean performance of all providers
Issue #1: Misclassification of Performance • Misclassification refers to reporting a provider’s performance in a way that does not reflect the provider’s trueperformance • Example: provider’s performance may be reported as being in category “1” when true performance is in category “2” • Two types of error:* • False negative: high quality provider is labeled as ordinary • False positive: ordinary provider is labeled as high performing *Source: Adams JL, The Reliability of Provider Profiling: A Tutorial, Santa Monica, Calif.: RAND Corporation,TR-653-NCQA, 2009. As of June 8, 2010: http://www.rand.org/pubs/technical_reports/TR653/
Issue #1: Misclassification of Performance • Misclassifying the performance of too many providers (and by too great an amount) may prevent the reports from having best possible impact: • Patients may choose low-performing providers, incorrectly believing that they are high-performing • Providers may prioritize the wrong areas for improvement: • Devoting scarce resources to areas in which they are truly doing fine • Neglecting areas in which they need to improve
Issue #1: Misclassification of Performance • Problem: We can’t observe a provider’s “true” performance • For the purpose of public reporting, future performance is what really matters • Two major sources of misclassification: • Systematic performance misclassification • A validity problem: When the performance being reported is determined by something other than what the performance is supposed to reflect • Example: Differences in mortality rates between hospitals being determined by differences in patient mix, rather than the delivery of “right care” • Performance misclassification due to chance • Random measurement error • It is not possible to know exactly which providers are misclassified due to chance, but we can calculate the “risk” or probability that each provider’s performance is misclassified
Issue #1: Misclassification of Performance • Misclassification is related to:* • The reliability of a measure • Which depends on sample size (which can vary provider to provider) • Variation between providers (so population dependent) • Number of cutpoints in the classification scheme • How close the performance score is to the cutpoint *Source: Safran, D. “preparing Measures for High Stakes Use: Beyond Basic Psychometric Testing. Academy Health, June 27 2010 presentation.
Issue #1: Misclassification of Performance • Examples of options for addressing misclassification discussed in white paper: • Exclude providers from reporting for whom the risk of misclassification due to chance is too high • Exclude measures for which the risk of misclassification due to chance is too high for too many providers • Modify the classification system used in the performance report • Report using fewer categories • Change the thresholds for deciding categories • Introduce a zone of uncertainty around performance cutpoints • Report shrunken estimates
Methods Issue #2: Validity • Validity –the extent to which the performance information means what it is supposed to mean, rather than meaning something else • Ask yourself: Does the measurement measure what it claims to measure? • Consider whether these threats to validity exist: • Is the measure controllable by the provider? • Does patient behavior affect the measure? • Is the measure affected by differences in the patients being treated? • Is the measure controlled by other factors than the provider?
Methods Issue #2: Validity • Lack of validity can lead to systematic misclassification of performance • Potential threats to validity: • Statistical bias (i.e., omitted variable bias, such as differences in case mix) • Selection bias (e.g., patients for whom performance data are available are not representative of patients who will use the report) • Information bias (e.g., providers differ in the amount of missing data they have, such that lower performing providers have more missing data)
Methods Issue #3: Reliability* • A statistical concept that describes how well one can confidently distinguish the performance of one provider from another • Measured as the ratio of the “signal” to the “noise” • The between-provider variation in performance is the “signal” • The within-provider measurement error is the “noise” • Measured on a 0.0 to 1.0 scale • Zero = all variability is due to noise or measurement error • 1.0 = all the variability is due to real differences in performance *Source: Adams JL, The Reliability of Provider Profiling: A Tutorial, Santa Monica, Calif.: RAND Corporation,TR-653-NCQA, 2009. As of June 8, 2010: http://www.rand.org/pubs/technical_reports/TR653/
Between-Provider Performance Variation Lower between-provider variation (harder to tell who is best) 50 0 100 Higher between-provider variation (easier to tell who is best) 0 100 50 = average performance for each provider
Different Levels of Measurement Error(Uncertainty about the “true” average performance) Higher measurement error (harder to tell who is best) 0 100 50 Lower measurement error (easier to tell who is best) 0 100 50 = average performance for each provider = range of uncertainty about “true” average performance
Methods Issue #3: Link between Reliability and Misclassification* • Reliability is a function of: • Provider-to-provider variation (which depends on the population) • Sample size • Providers typically vary in their number of “measured events” (i.e., some providers have more information than others) • Higher reliability in a measure: • Means more signal, less noise • Reduces likelihood that you will classify provider in “wrong” category • Per Adams*: “Reliability ASSUMES validity” *Source: Adams JL, The Reliability of Provider Profiling: A Tutorial, Santa Monica, Calif.: RAND Corporation,TR-653-NCQA, 2009. As of June 8, 2010: http://www.rand.org/pubs/technical_reports/TR653/
Higher misclassification risk Lower reliability Classification system: More categories Higher within-provider measurement error Lower between-provider variation in performance Higher average error per observation Lower number of observations Misclassification risk:Various Factors Contribute to the Risk
Methods Issue #4: Composite Measures • Composite measures are “summary measures” • Combine data from 2 or more individual measures into a single measure • Example: 4 separate preventive care measures may be combined into an overall preventive care composite • Potential advantages are: • Fewer measures may be easier for patients to digest • May increase reliability, thereby lowering risk of misclassification • Key decision questions: • Will composites be used? • If used, which measures will be combined? • How will individual measures be combined (i.e., the construction of the composite)?
Methods Issue #4: Composite Measures • There are different types of composite measures and methods to construct composites • Options for combining measures include: • “Reflective” or “latent” composites: • Let the data decide which measures to include (e.g., via factor analytic methods) • “Formative” composites: • Based on judgment as to what to include • Nationally-endorsed composites • Options for constructing: • “All-or-none” methods (success on every measured service) • Weighted average methods (need to define the weights)
Other Topics Addressed in White PaperCreating Reports of Provider Performance • Negotiating consensus on goals and “value judgments” of performance reporting • What are the purposes of publicly reporting provider performance? • What will be the general format of performance reports? • What will be the acceptable level of performance misclassification due to chance? • Selecting the measures that will be used to evaluate provider performance • Which measures will be included in a performance report? • How will the performance measures be specified? • What patient populations will be included?
Other Topics Addressed in White PaperCreating Reports of Provider Performance • Identifying data sources and aggregating performance data • What kinds of data sources will be included? • How will data sources be combined? • How frequently will data be updated? • Checking data quality and completeness • How will tests for missing data be performed? • How will missing data be handled? • How will accuracy of data interpretation be assessed?
Other Topics Addressed in White PaperCreating Reports of Provider Performance • Computing provider-level performance scores • How will performance data be attributed to providers? • What are the options for handling outlier observations? • Will case mix adjustment be performed, and if yes, how? • What strategies will be used to limit the risk of misclassification due to chance? • Creating performance reports • How will performance be reported? • Single points in time, trends • Numerical scores • Categorizing performance • Will composite measures be used? Is yes, how to combine measures and construct the composite • What final validity checks might improve the accuracy and acceptance of reports?
Concluding Thoughts • Groups of stakeholders engaging in measurement for public reporting may choose different options at each decision point • Our goal was to illustrate the advantages and disadvantages of various options at each decision point • Consultation with a statistician may yield tailored advice • The menu of options for each decision point is not exhaustive • Going through the options may stimulate discussion and negotiation
To obtain a copy of the White Paper, please email peggy.mcnamara@ahrq.hhs.gov and she will put you on the distribution list For more details on the information in the report, contact: Mark Friedberg: mfriedbe@rand.org Cheryl Damberg:damberg@rand.org