The “TOTAL” in TOTAL ERROR

The “TOTAL” in TOTAL ERROR Jan S. Krouwer, Ph.D.

Contents • Total Error models including the Westgard model • Glucose meter simulation using the Westgard model • Performance Standards - glucose meter example • How our company specified and evaluated performance

Total Error “The physician thinks rather in terms of total analytical error, which includes both random and systematic components. From his point of view, all types of analytic error are acceptable as long as the total analytic error is less than a specified amount.” Better if the word analytic were dropped Westgard Clin Chem 1974

Total error models 1) TE = systematic error + random error 2) TE = (average) bias + Z x imprecision 1) and 2) are not the same! 1) is true but not very useful 2) is not true but is often very useful

Total error models • TE = (average) bias + Z x imprecision • However the above equation is used and by whomever, it is called the Westgard model* • Typically used 3 ways • Assess performance from a method comparison • Assess performance from QC and set up QC • Set performance specifications *Mandel 1964 book (The statistical analysis of experimental data) predates the Westgard model

The Westgard Model for total error (1) TE = average bias + Z x imprecision is useful but an incorrect (incomplete) model A better model is: (2) TEi = average bias + Z x imprecision + Z x (random interferences)i (1) is used, (2) is ignored, but (2) is still better (2) Lawton WH, Sylvester EA, Young-Ferraro BJ. Statistical comparison of multiple analytic procedures: application to clinical chemistry. Technometrics. 1979;21:397-409

The Westgard Model for total error • With the Westgard model, both assays have the same total error, which can’t be! • Lawton model says the assay on the left has more error.

Total Error Models A fault tree model of total error adds even more terms Krouwer Clin Chem 1991 Krouwer Archives Pathol Lab Med 1992 TE = Z*sd + int. + slope + nonlinearity + drift + sample carryover + reagent carryover + Z*random interferences

Total Error Models • But even these expanded models are incomplete. • They fail to deal with user error, software errors, and some other analytical errors which cannot be easily modeled. • And any model (includes measurement uncertainty) that deals with a multiple of imprecision (e.g. 95% of results) makes no sense – how can TE not be 100%! • The multipliers (Z) must be small enough for TE to be reasonable • If TE accounted for 100% of the results, limits = - infinity to + infinity

Westgard model summary • Westgard model (and its ugly cousin six sigma) are very useful in assessing performance • In spite of not including all TE error sources • But the Westgard model is unsuitable for establishing specifications or judging an assay’s medical acceptability* • Because it does not include all TE error sources *Westgard model can be used to show an assay is medically unacceptable.

Glucose Meter Example • Boyd and Bruns (2001 Clin Chem) used the Westgard model to establish acceptable limits for glucose meters • They provided limits for ave. bias and imprecision to model TE. • If the modeled TE was less than limits, the meter was medically acceptable • Krouwer (2001 Clin Chem) letter to the editor says their model is wrong by failing to account for patient interferences

Boyd and Bruns Respond In their response, they said I was correct but the sources of error I mentioned (patientinterferences) were: “outside the scope of our study, in part because it is difficult to know how one might model the interferences.” Interferences are easy to model and if you don’t know how to do it, don’t publish an incomplete model!

Boyd and Bruns Respond They went on to say that in their article they discussed the need for manufacturers to: “design instruments that avoid sources of error, such as those encountered by patients with special needs.” This is messed up! Models are supposed to fit actual data – are they asking manufacturers to design instruments that fit their model?

Boyd and Bruns Respond “The points raised in Dr. Krouwer’s letter do point out that our estimates of quality requirements, as demanding as they may seem, would become even more demanding if the additional sources of error were included.” Sorry! – not so In the example figure, Westgard ave. bias is zero. Reducing imprecision will not move the outliers closer to the Y=X line.

And we’re going to ignore this Boyd JC, Bruns DE. Monte Carlo simulation in establishinganalytical quality requirements for clinical laboratories tests meeting clinical needs. Methods Enzmol. 2009;467:311-433. KaronBS, Boyd JC, Klee GG. Glucose meter performance criteria for tight glycemic control estimated by simulation modeling. ClinChem. 2010;56:1091-1097. (Cited as an “outcome study” - Milan) Krouwer JS The danger of using total error models to compare glucose meter performance. Journal of Diabetes Science and Technology, 2014;8:419-421. Brazg RL Klaff LJ and Parkin CG Performance Variability of Seven Commonly Used Self-Monitoring of Blood Glucose Systems: Clinical Considerationsfor Patients and Providers Journal of Diabetes Science and Technology2013;7:144-152.

Follow-up paper on why Boyd & Bruns are wrong • Picture shows hemoglobin interference for some glucose meters • One can model the expected glucose meter result as a function of hemoglobin • Next slide shows that Westgard model misses interference error compared to Lawton model. • Westgard model gives blue line for meters A and B • Lawton model gives blue line for meter A and red line for meter B

My simulation

Still no Effect December 2014 ClinChem Boyd JC, Bruns DE Performance Requirements for glucose assays in intensive care units Wilinska ME, Hovorka R Glucose control in the intensive care unit by use of continuous glucose monitoring: what level of measurement error is acceptable Van Herpe T, de Moor B, Van den Berghe G, Mesotten D Modeling of effect of glucose sensors errors on insulin dosage and glucose bolus computed by LOGIC-insulin Krouwer JS Using the wrong model can lead to unsupported conclusions about glucose meters. Clinical Chemistry, 2015;61:666

So now I pursue my hobbies

Summary The Westgard model is incomplete – it does not account for all errors. Yet it is useful because it provides the location for most errors. A danger in using the Westgard model is to set specifications as Boyd and Bruns did and to say that if limits are met, the glucose meter will meet medical needs. I published flaws in the Boyd and Bruns approach. Their reply did nothing to diminish the flaws in their model and its use. But they persist in publishing and have ignored my concerns. A conference devoted to how to set specifications (Milan conference) recommended outcome studies as the best way to set specifications. There was no reference to “direct” outcome studies but the Boyd and Bruns model was the only citation as a way to perform “indirect” outcome studies. I gave up and pursue my hobbies.

Performance standards: ISO 15197 Glucose meter (2003) “Ninety-five percent (95%) of the individual glucose results shall fall within ± 0.83 mmol/liter (15 g/dl) procedure at glucose concentrations ≤4.2 mmol/liter (75 mg/dl) and within ±20% at glucose concentrations >4.2 mmol/liter (75 mg/dl).” According to ISO, this requirement is “based on the medical requirements for glucose monitoring.” But 5% unspecified results means that once a week, a glucose meter that meets the ISO standard can kill you

Industry controls the standards groups • Head of ISO 15197 committee was the regulatory affairs manager for Ortho • Golden rule of regulatory affairs – “Thou shall never refer to results that can cause serious harm or death” • May be reason that ISO didn’t use glucose meter error grids

95% Specified in an error grid Specified Unspecified Unspecified

Some of my responses • Krouwer JS Recommendation to treat continuous variable errors like attribute errors. ClinChem Lab Med 2006;44(7):797–798. • Krouwer JS and Cembrowski GS. A review of standards and statistics used to describe blood glucose monitor performance. Journal of Diabetes Science and Technology, 2010;4:75-83 • Krouwer JS Wrong thinking about glucose standards. ClinChem, 2010;56:874-875.

Leaving 5% unspecified … • In other fields no one would have specifications that imply… • 5% wrong site surgery is ok • 5% of nuclear power plants can have melt downs

Revised glucose meter standards (2013) • CLSI 98% – 2% of results now unspecified • ISO 99% – 1% of results now unspecified • Mitch Scott – It was a compromise • David Sacks – You can’t prove zero defects • In US, almost 8 billion glucose meter results per year • 2% = 1,600,000; 1%=800,000.

My new responses • Krouwer JS Why specifications for allowable glucose meter errors should include 100% of the data. Clinical Chemistry and Laboratory Medicine, 2013;51:1543-1544. • Krouwer JS The new glucose standard, POCT12-A3 misses the mark. Journal of Diabetes Science and Technology, 2013;7:1400-1402.

And then, something happened . . . 2005 – 32 meters 2009 – 36 meters 2014 – 87 meters

Loss of market share causes big players to act • (Company sponsored) studies show many meters performance don't meet standards • For FDA cleared meters,75% meet ISO 15197-2003, 48% meet ISO 15197-2013 • FDA publishes new glucose standards – Jan. 7, 2014 • Surveillance error grid published – how do meters perform after release for sale – May 1, 2014 Krouwer JS Biases in clinical trials performed for regulatory approval AccredQual Assur 2015;20:437-9Klonoff, DC Prahalad P Performance of cleared blood glucose monitors. J Diabetes SciTechnol 2015;9:895-910.Klonoff, DC Lias C Parkes JL et. al. Development of the Diabetes Technology Society Blood Glucose Monitor System Surveillance Protocol. J Diabetes SciTechnol2015;9:in press

FDA draft standard (2014) • FDA regarding ISO – Fuhgeddaboudit = Forget about it • “Although many manufacturers design their BGMS validation studies based on the International Standards Organizations document 15197, FDA believes that the criteria set forth in the ISO 15197 standard do not adequately protect patients using BGMS devices in professional settings, and does not recommend using these criteria for BGMS devices.“

FDA draft standard (2014) • Now 100% of the results are accounted for • “In order to demonstrate that a BGMS device is sufficiently accurate to be used safely by health care professionals, you should demonstrate that 99% of all values are within +/- 10% of the reference method for glucose concentrations > 70 mg/dL, and within +/- 7 mg/dL at glucose concentrations < 70 mg/dL. To avoid critical patient management errors, no individual result should exceed +/- 20% of the reference method for samples >70 mg/dL or +/- 15 mg/dL <70 mg/dL.”

Thereafter, things went downhill • DTS published the glucose meter surveillance protocol with acceptance criteria, which when met would allow the meter to have the DTS seal of approval (May 1, 2016) • But the acceptance criteria are based on ISO standards1 • FDA publishes final version of their glucose meter standards and no longer requires 100% of the data to meet specifications2 1Krouwer JS Why the Diabetes Technology Society surveillance protocol for glucose meters needs to be revised. J Diabetes SciTechnol2017 in press 2Krouwer JS Why the New FDA Glucose Meter POCT Guidance Is Disappointing J Diabetes SciTechnolDec.15, 2016 in press

So now I pursue my hobbies

Summary The ISO glucose meter standards – either 2003 or 2013 do not account for all errors. The 2003 committee was dominated by regulatory affairs people. I published several papers critiquing these standards. Two people on the CLSI glucose standard gave inadequate justifications for not including 100% of the data. The number of glucose meters in recent years has increased dramatically. Many of these new meters have two properties – their reagent strips are inexpensive and their performance is poor. Big companies took notice as their market share was threatened. Industry sponsored studies demonstrated the meters’ poor performance. FDA published draft glucose meter guidance which required 100% of the results to be specified and a new surveillance glucose error grid was issued. But things went downhill when the surveillance error grid was published which used ISO as its acceptance criteria and the new FDA glucose meter guidance no longer required 100% of the data to be specified. I gave up and pursue my hobbies.

How we (Ciba Corning ) evaluated our products using one plot • 95% of data must fall within green lines – no data beyond red lines • Simple – no modeling needed • Krouwer JS Estimating Total Analytical Error and Its Sources: Techniques to Improve Method Evaluation. Arch Pathol Lab Med., 116, 726-731 (1992).

The “TOTAL” in TOTAL ERROR