310 likes | 428 Views
Tackling over-dispersion in NHS performance indicators. Robert Irons (Analyst – Statistician) Dr David Cromwell (Team Leader). 20/10/2004. Outline of presentation. NHS Star Ratings Model Criticism of some of the indicators The reason – overdispersion Options for tackling the problem
E N D
Tackling over-dispersion in NHS performance indicators Robert Irons (Analyst – Statistician) Dr David Cromwell (Team Leader) 20/10/2004
Outline of presentation • NHS Star Ratings Model • Criticism of some of the indicators • The reason – overdispersion • Options for tackling the problem • Our solution – an additive random effects model • Effects on the ratings indicators
Performance Assessment in the UK • 1990s: Government focused on efficiency • 1997: Labour replaces Conservative government • Late 90s: Labour focus on quality & efficiency • Define Performance Assessment Framework • Publish NHS Plan in 2000 • Commission for Health Improvement (CHI) created • Performance ratings first published in 2001, responsibility passed to CHI for 2003 publication • Healthcare Commission replaces CHI on April 2004, has broader inspection role
NHS Performance Ratings • An ‘at a glance’ assessment of NHS trusts’ performance • Performance rated as 0, 1, 2, or 3 stars • Yearly publication • Focus on how trusts deliver government priorities • Linked to implementation of key policies • Priorities and Planning framework • National Service Frameworks • Have limited role in direct quality improvement • Modernisation agency helps trusts with low rating
The ratings model • Overall rating derived from many different indicators • and affected by Clinical Governance Reviews • Two types of indicators, organised in 4 groups • Key targets & Balanced Scorecard indicators • BS indicators grouped into 3 focus areas • Patient focus, clinical focus, capacity & capability
Combining the indicators • Indicators are measured on different scales • Categorical (eg. Yes/No) • Proportional (eg. proportion of patients waiting longer than 15 months) • Rates (eg. mortality rate within 30 days following selected surgical procedures) • Further complication • Performance on some indicators is measured against published targets – define thresholds • Performance on other indicators is based on relative differences between trusts
Combining the indicators • Indicators first transformed so they are all on an equivalent scale • Key targets assigned to three levels: • achieved • under-achieved • significantly under-achieved • Balanced scorecard indicators • 1 – significantly below average (worst performance) • 2 – below average • 3 – average • 4 – above average • 5 – significantly above average (best performance)
Transforming the indicators • Key target indicators transformed using thresholds defined by government policy • Balanced scorecard indicators transformed via several methods • Percentile method • Statistical method • Absolute method, if policy target exists • Mapping method (for indicators with ordinal scales)
Significantly below average 1 no 99% confidence interval overlap: higher values Below average 2 no 95% confidence interval overlap: higher values Average 3 overlapping 95% confidence intervals, eg England: 5.51% to 5.55% Above average 4 no 95% confidence interval overlap: lower values Significantly above average 5 no 99% confidence interval overlap: lower values The old statistical method • Based on simple confidence intervals • 95% and 99% confidence intervals calculated for a trust’s indicator value • Trust confidence interval compared with the overall national rate (effectively a single point)
The old statistical method- problematic • Not a proper statistical hypothesis test • Differentiating between trusts based on differences that exceed levels of sampling variation • On some indicators, this led to the assignment of too many NHS trust to the significantly good/ bad bands on some indicators
Working example- standardised readmission rate of patients within 28 days of initial discharge
Readmissions within 28 days of discharge- funnel plot (2003/04 data)
Mortality within 30 days of selected surgical procedures- funnel plot (2003/04 data)
Z scores • Standardised residual • Z scores are used to summarise ‘extremeness’ of the indicators • Funnel plot limits approximate to the naïve Z score • Naïve Z score given by • Zi = (yi –t)/si • Where yi is the indicator value, and si is the local standard error
Dealing with over-dispersion • Three options were considered • Use of an ‘interval null hypothesis’ • Allow for over-dispersion using a ‘multiplicative variance model’ • …or a ‘random-effects additive variance model’
Interval null hypothesis • Similar to the naïve Z score or standard funnel limits • Uses a judgement of what constitutes a normal range for the indicator • Define normal range (eg percentiles, national rate ± x%) • Funnel limits then defined as: • Upper/ lower limit = Range limit ± (x * si0) • Reduces number of significant results • But might be considered somewhat arbitrary • Interval could be defined based on previous years’ data, or prior knowledge • Makes minimal use of the sampling error
Multiplicative variance model • Inflates the variance associated with each observation by an over-dispersion factor ( ): • Zi2= Pearson X2 • = X2 / I • Limits on funnel plot are then expanded by • Do not want to be influenced by the outliers we are trying to identify • Data are first winsorised (shrinks the extreme z-values in) • Over dispersion factor could be provisionally defined based on previous years’ data • Statistically respectable, based on a ‘quasi-likelihood’ approach
Multiplicative over-dispersion-a funnel plot (not winsorised, = 21.45)
Multiplicative over-dispersion-a funnel plot (10% winsorised, = 13.97)
Winsorising • Winsorising consists of shrinking in the extreme Z-scores to some selected percentile, using the following method. • Rank cases according to their naive Z-scores. • Identify Zq and Z1-q, the (100*q)% most extreme top and bottom naive Z-scores, where q might, for example, be 0.1 • Set the lowest (100*q)% of Z-scores to Zq, and the highest (100*q)% of Z-scores to Z1-q. These are the Winsorised statistics. • This retains the same number of Z-scores but discounts the influence of outliers.
Winsorising Non winsorised • Winsorising 10% winsorised
Random effects additive variance model • Based on a technique developed for meta-analysis • Originally designed for combining the results of disparate studies into the same effect • In meta-analysis terms, consider the indicator value of each trust to be a separate study • Essentially seeks to compare each trust to a ‘null distribution’ instead of a point • Assumes that E[yi] = i, and V[i] = • Uses a method-of-moments method to estimate (Dersimonian and Laird, 1986) • Based on winsorised estimate of
Random effects additive variance model • If ( I ) < ( I – 1) then • the data are not over-dispersed, and = 0 • use standard funnel limits/ naïve Z scores • Otherwise: • Where wi = 1 / si2 • The new random-effects Z score is then calculated as:
Why we chose the additive variance method • Generally avoids situations where two trusts which have the same value for the indicator get put in different bands because of precision • A multiplicative model would increase the variance at some trusts more than at others • e.g. a small trust with large variance would be affected much more than a large trust with small variance • By contrast, an additive model increases the variance at all trusts by the same amount • Better conceptual fit with our understanding of the problem, that the factors inflating variance affect all trusts equally, so an additive model is preferable
References: DJ Spiegelhalter (2004) Funnel plots for comparing institutional performance. Statistics in Medicine, 24, (to appear) DJ Spiegelhalter (2004) Handling over-dispersion of performance indicators (submitted) R DerSimonian & N Laird (1986) Meta-analysis in clinical trials. Controlled Clinical Trials, 7:177-188 Acknowledgements: David Spiegelhalter Adrian Cook Theo Georghiou Thank you