1 / 61

…Borrelia diagnostics – statistical aspects

…Borrelia diagnostics – statistical aspects. Jørgen Hilden jh@biostat.ku.dk. Notes have been added in this file. February 2009. Biostatistical motto: Formalism with a human face. Plan of my talk. Clinicometric framework Descriptors of diagnostic power

nestrella
Download Presentation

…Borrelia diagnostics – statistical aspects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. …Borrelia diagnostics – statistical aspects Jørgen Hilden jh@biostat.ku.dk Notes have been added in this file February 2009

  2. Biostatistical motto: Formalism with a human face Plan of my talk Clinicometric framework Descriptors of diagnostic power Displays of diagnostic power including the ROC diagram Simultaneous use of 2 measurements Randomized testing of diagn. procedures Special topics in supplementary slides

  3. Topics not mentioned Systematic reviews & meta-analyses

  4. ”Clinicometrics” …always considers a stream of cases ( statisticians say: a population of cases ): They are the units of clinical experience and also of clinical decision making. They are instances of a (well-defined?) clinical problem, ”the who-how-where-why of a patient-doctor encounter.” Therefore…

  5. * In clinical studies the choice of sample, and of the variables on which one bases one's prediction, must match the clinical problem as it presents itself at the time of decision making. In particular, one mustn't discard subgroups (as ‘atypical’ or ‘impurities’) that did not become identifiable until later: ensure prospective recognizability ! Data collection *as opposed to the ’engineering’ phases

  6. Purity vs. representativeness:A meticulously filtered case stream ( 'proven single-agent infections', or 'meeting CDC criteria' )may be needed for patho- and pharmaco-physiological research, but is inappropriate as a basis for clinical decision policies [incl. cost studies]. Data collection

  7. Your job is to create decision rules that help the clinician decide, e.g.- whether to proceed with antibiotics- when to plan clin. & serol. follow-up checks - when to apply other tests for, e.g. HSV►ideallydrawing a completemanagement flowchart,i.e. a bushy tree of action diagnoses, not etiological diagnoses Don’t forget…

  8. Consecutivity as a safeguard against selection bias.Standardization: How? Who? Where? When? Gold standard … the big problem !! w. blinding, etc.Safeguards against change of data after the fact. Data collection w3.consort-statement.org/Initiatives/stardClinical_Chemistry_statement.pdf

  9. Quantitative markers Focussing on A quantity holds the result of a diagnostic procedure. Histograms describe its distribution in two subpopulations. We can interpret ordinates and areas under the two humps in terms of true and false decisions … and get a feel for the trade-off involved, provided that the pre-test probability of disease (percentage diseased) is known.

  10. … principle … Diseased Non-disease Healthy False negatives False positives Positive range Negative range False negative Measurement False positive Cutoff point Measurement Each area = 1.00 = 100 % of the subpopulation

  11. … principle … Diseased Sensitivity ( true positive fraction ) Non-disease Specificity ( true negative fraction ) Healthy False negatives False positives Positive range Negative range False negative Measurement False positive Cutoff point Measurement Note: BLACK&WHITE paradigm!

  12. the probability square’ All the positives Pre-test ’case mix’ 30% 70% diseased non-diseased I.e., 64.4 % of cases are true negatives; the other three areas are analogous. 1 – spec. = false positive fraction Sensitivity, true posit. fraction True negatives area = 0.70 × 0.92 = 0.644 Specificity = 0.92, say

  13. Classical terminology ”Positive” = suggestive of (target) disease ”Negative” = suggestive of its absence ”False / True Positive / Negative …” Sensitivity = TP/(those diseased) Specificity = TN/(those without it) What is meant by PV( ”predictive value” )? What is meant by LR( ”likelihood ratio” )?

  14. Classical terminology ”Positive” = suggestive of (target) disease ”Negative” = suggestive of its absence ”False / True Positive / Negative …” Sensitivity = TP/(those diseased) Specificity = TN/(those without it) PVpos = the ”predictive value” of a positive outcome = TP/(all positives) = Pr{ disease | pos } …chance that the test is right when it says ”positive”

  15. Classical terminology ”Positive” = suggestive of (target) disease ”Negative” = suggestive of its absence ”False / True Positive / Negative …” Sensitivity = TP/(those diseased) Specificity = TN/(those without it) PVneg = the ”predictive value” of a negative outcome = TN/(all negatives) = Pr{ non-disease | neg } …chance that the test is right when its verdict is ”negative”

  16. ”Likelihood ratio” principle … pre-test odds = 3 : 7 30% 70% diseased non-diseased ”LR” = 5 : 1 (the ratio of red arrows); ergo post-test odds = 15 : 7. 1 – specificity Sensitivity

  17. Specificity is not bad. Yet most positives are false positives Pre-test odds low in Lyme problems diseased non-diseased ”LRpos” = 5 : 1 is fair; but post-test odds and PVpos are still low. 1 – specificity Sensitivity

  18. classical terminology: Not quite so Sensitivity = TP/(those diseased) Specificity = TN/(those without it) LRpos = the ”likelihood ratio” occasioned by a positive outcome = (sensitivity) / (1 – specificity) = Pr{ pos | disease } / Pr{ pos | non-disease }

  19. classical terminology: Not quite so Sensitivity = TP/(those diseased) Specificity = TN/(those without it) LRneg = the ”likelihood ratio” occasioned by a negative outcome = (1 – sensitivity) / (specificity) = Pr{ neg | disease } / Pr{ neg | non-disease } = 0.1 = 1 : 10, for instance. If the pre-test risk of Lyme Disease is low, say p= 2%, a negative outcome almost eliminates it: (post-test odds) = (pre-test odds)(LR) = (1 : 49)(1 : 10) = (1 : 490) .

  20. ”LR” principle: it’s the factor by which the observed data will change the odds Diseased Non-disease Healthy LRpos = ratio – False negatives – ratio = LRneg* False positives Positive range Negative range False negative Measurement False positive Cutoff point Measurement *LRneg < 1 (!)

  21. ”LR” principle: it’s still the factor by which the observed data will change the odds LRdatawhen data = a measurement value* Diseased Non-disease Healthy False negatives False positives Cutpoint now irrelevant Positive range Negative range False negative Measurement False positive Cutoff point * Measurement LR= ratio

  22. Warning… A 2-gate study 50 diseased 75 non-diseased ”LRpos” = 5 : 1 but the ”predictive values” and the post-test odds are unavailable. 1 – specificity Sensitivity

  23. Confirmed non-infected case Confirmed infection IgM A 2-dim. task IgG

  24. IgM ? A 2-dim. task Iso-Likelih. Ratio lines (uphill arrow) IgG ?

  25. IgM A 2-dim. task Nearest-neighbours classification of a new patient ? IgG

  26. IgM A 2-dim. task Kernel methods form a weighted average of neighbouring prototypes (diagnosed cases) etc., with decreasing influence, the farther away. IgG

  27. IgM Iso-density (iso-tætheds-) linier IgG

  28. IgM Iso-Likelihood Ratio lines (uphill arrows) IgG

  29. Simulated data (100+100) Infection Non-infected IgM IgG

  30. A ROC diagram shows the true positive fraction against the false positive fractionas a function of the choice of cutoff point Everyone treated as positive Liberal cutoff Strict cutoff Hypothetical smooth trajectory, and two raw empirical ones [ sample sizes: 17+17 ( ), and 40+40 ( ) ] Everyone negative

  31. The ROC diagram describes the nosographics nosographic properties Sens, spec. LRpos, LRneg = slopes of segments. Y = Youden’s Index = sens + spec – 1 is equivalent to AUC [Area Under Curve] = ½(sens + spec) in this case. ROC Y = 1 FN neg We are within the Black&White Paradigm pos TP Y = 0 BLACK&WHITE FP TN

  32. The ROC diagram describes the nosographics* The slope of each outcome line is its LR; e.g. LRpos = (TP fraction of Diseased)/(FP fraction of non-dis.) ROC Y = 1 FN neg pos TP Y = 0 *i.e., the information obtainable from a 2-gate study FP TN

  33. Idealtest Reassuring Almost no evidence either way Three test outcomes Ominous

  34. Idealtest Negative Neg.? +/- Ordered how? By increasing slope, i.e. LR [ concavity ! ] Pos.? Ordered (ordinal) test outcomes Three test outcomes Positive

  35. The slope reflects the medical trade-off between % sensitivity and % specificity Negative Those with a ”+/ – ” test result are best treated as negative in this situation Neg.? +/- Ordered how? By increasing slope, i.e. LR [ concavity ! ] Possibly positive A ’constant-benefit’ line Ordered (ordinal) test outcomes Three test outcomes Definitely positive Trade-off? Constant benefit? … Please take a look at the supplementary figures

  36. Interpretation of the area under the ROC as a rank statistic ( cf. Wilcoxon-Mann-Whitney ) E.g., 5 cases of disease D and 10 non-D cases: The ROC square holds 50 small rectangles, 40 of which happen to be below the ROC trajectory, because 40 times (out of 50) it so happens that a a non-D finding > a D-group finding [the desired ordering]. For an example, see patient *vs. patient **. Area Under ROC Curve = freq{ (non-D value) > (D value) } = 0.80.

  37. The 5 cases of disease D and 10 non-D cases: The ROC square holds 50 small rectangles, 40 of which happen to be below the ROC trajectory, because 40 times (out of 50) it so happens that a a non-D finding > a D-group finding [the desired ordering]. For an example, see patient *vs. patient **. Area Under ROC Curve = freq{ (non-D value) > (D value) } = 0.80. But where does that lead us? The AUC has no definable interpretation in terms of blood, sweat and tears (loss, benefit, utility). It only has a softassociation with decision-analytic measures of diagnostic power (separation, discrimination). Its frequent use is purely a matter of being the popular girl in the class.

  38. What!? The primary virtues of the ROC: it allows you (1) to compare tests regardless of scale, units, & transformations (2) to see oddities [ which may point to a technical problem, or call for a revised test interpretation rule ]

  39. Lesions in floating locations Suspect area? Red = as the imagist saw it Green = surgical truth How do we score diagnostic performance in such situations ???

  40. Digression… Randomized trials of diagn. tests … theory under development Purpose & design: many variants Sub(-set-)randomization, depending on the pt.’s data so far collected. ”Non-disclosure”: some data are kept under seal until analysis. No parallel in therapeutic trials! Main purposes …

  41. … Randomized trials of diagn. tests • when the diagnostic intervention is itself potentially therapeutic; • when the new test is likely to redefine the disease(s) ( cutting the cake in a completely new way ); • when there is no obvious rule of translation from the outcomes of the new test to existing treatment guidelines; 4) when clinician behaviour is part of the research question… …end of digression

  42. Statistical analysis … in the narrow sense: … is very much standard once you know what aspects to count and compare. To know that, work backwards from (likely) consequences: what would have happened to these patients? And what would have happened in the alternative scenario? Never argue ”It’s customary to calculate … (this or that)” !

  43. Thank you ! Let me add a personal maxim: Never ask ”What can the journal impact factors do for me?” Ask instead ”What can I do for the journal impact factors?”

  44. Supplementary pictures follow here … Vassily Vlassov pixit

  45. The rôle of noise Pure noise, independent of the patient’s true condition, flattens distributions and hence flattens the ROC;  less information. Remedies: technical & procedural standardization, duplicate measurements, (averaging over assessors, dominance-free consensus formation) … … may be ineffective if the noise is ”inter-patient”

  46. Its slope reflects the medical trade-off between % sensitivity and % specificity Negative Neg.? +/- Presumably positive Ordered how? By increasing slope, i.e. LR [ concavity ! ] An ”iso-benefit” line Ordered (ordinal) test outcomes Three test outcomes Definitely positive Slope? Constant benefit? … Let’s first look at a continuous test & selection of cutoff that maximizes benefit

  47. The slope chosen so as to imply constant benefit x = c How do we find that critical slope? It depends on the pre-test ’disease mix’ – and on the (human) loss associated with wrong or suboptimal treatment - when only two courses of action are available (otherwise there will be more lines, reflecting several trade-offs). A continuous test Cutoff at measurement x = c maximizes benefit

  48. The slope chosen so as to imply constant benefit x = c Treat no-one Treat everybody A continuous test : Cutoff at measurement x = c maximizes benefit

  49. The slope chosen so as to imply constant benefit No misdiagnoses Without the test, it’s (slightly) better to treat everybody than to treat no-one. With the test available, about 60 % of the ’misdiagnostic burden’ is eliminated; cf. purple bar. x = c Treat no-one Treat everybody A continuous test : Cutoff at measurement x = c maximizes benefit

More Related