1 / 59

TEST ACCURACY EVIDENCE & DIAGNOSTIC DECISION MAKING

TEST ACCURACY EVIDENCE & DIAGNOSTIC DECISION MAKING. Clare Davenport Senior Clinical Lecturer Public Health, Epidemiology and Biostatistics University of Birmingham. Chris Hyde Professor of Public Health and Clinical Epidemiology University of Exeter.

boone
Download Presentation

TEST ACCURACY EVIDENCE & DIAGNOSTIC DECISION MAKING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TEST ACCURACY EVIDENCE & DIAGNOSTIC DECISION MAKING Clare Davenport Senior Clinical Lecturer Public Health, Epidemiology and Biostatistics University of Birmingham Chris Hyde Professor of Public Health and Clinical Epidemiology University of Exeter

  2. “I consider much less thinking has gone into the theory underlying diagnosis, or possibly one should say less energy has gone into constructing the correct model of diagnostic procedures, than into therapy or prevention where the concept of ‘altering the natural history of the disease’ has been generally accepted and a theory has been evolved for testing hypotheses concerning this.” Archie Cochrane. Effectiveness and efficiency. Random reflections on health services. The Nuffield Provincial Hospitals Trusts, 1972.

  3. Diagnostic accuracy Medical Test Information Test harms/ placebo effects Diagnostic yield Decision Patient Outcome Action Management Diagnostic and Treatment Pathways Accuracy provides information about the hypothetical value of a test in decision making

  4. Phases of Test Evaluation Technical Performance • Does CT produce good quality images, reliably and reproducibly? • Do CT images accurately differentiate diseased from non-diseased patients? DIAGNOSTIC PERFORMANCE • Does CT change how diagnoses are made by doctors? Diagnostic Impact • Does CT change how treatment decisions are made? Therapeutic Impact • Does CT ultimately reduce mortality or morbidity? Patient Health Impact Fryback and Thornbury Med Dec Mak 1991;11:88-94

  5. Why Emphasis on Test Accuracy? • Lax regulatory system for tests: no requirement for evidence about patient impact • Health Technology assessment traditionally concerned with treatments -what if we are treating the wrong patients? -what if test use itself causes harm? • Medical education: concept of EBM relatively new and test evaluation even more so • Difficulties conducting test-treat trials: sample size; blinding; measurement and reporting of diagnostic and treatment decisions

  6. Ferrante di Ruffano et al. BMJ 2012;344:e686 Outcomes Framework: 14 Mechanisms 2. Timing Test 1. Test Process 5. Interpretability 4. Timing Results 3. Feasibility 6. Accuracy • Test results • produced Patientgiven test 7. Timing Diagnosis • Diagnostic decision made 8. Dx confidence 14. Patient / Clinician Perspective 9.Dx Yield • Treatment decision made 10. Rx Yield Patient Outcome 11. Rx Confidence • Treatment implemented 13.Timing Treatment 12. Adherence

  7. Diagnostic confidence • Do I know where to find trustworthy information on the accuracy of this test? • Do I understand the test accuracy information available? • What are the implications of the accuracy of this test for my patients

  8. Test Accuracy Measures DOR = True Positives X True Negatives False Negatives X False Positives Test Result as Reference Class Positive Predictive Value TP / (TP+FP) Negative Predictive Value TN / (TN+FN) Disease as Reference Class Sensitivity TP / (TP+FN) Specificity TN / (TN+FP)

  9. AUC DOR Relative specificity Summary ROC curves LR- Accessibility of test accuracy information for decision making Summary specificity ERROR RATE NPV Relative sensitivity ROC Space Summary sensitivity 2x2 table LR+ SPECIFICITY ROC plot RDOR

  10. In most testing situations either false negative or false positive test errors are more important

  11. Ca125 testing for ovarian cancer CA125 is a marker used to identify individuals who have an increased risk of ovarian cancer. NICE endorse the use of the test in primary care to select who should undergo pelvic ultrasound testing with and those who can safely not be investigated further. Sensitivity OR Specificity?

  12. Ca125 testing for ovarian cancer Do not have ovarian cancer but test positive • False Positives receive further investigation with US • False Negatives do not receive further investigation Have ovarian cancer but test negative Sensitivity : FN Specificity : FP

  13. Research rationale (1) Contextual variables cause variation in test accuracy estimates: -Characteristics of the population to be tested -Variation in the conduct of the test itself The proposed role of a test determines the point in a care pathway that it should be evaluated and any comparator tests that should be considered. Context determines the relative value placed on test errors (false positives and false negatives) Systematic reviews of test accuracy offer the potential to mitigate the lack of contextual fit observed in primary studies of test accuracy… are they doing their job?

  14. Research rationale (2) It is stated to be common knowledge that decision makers have difficulty understanding and applying test accuracy information BUT What is the EXTENT of the problem ? WHY do decision makers have particular difficulty with test accuracy measures – what is the nature of the difficulty?

  15. To what extent are contextual considerations represented in Test Accuracy Reviews ?

  16. Research methods Aims: -Assess the extent to which testing context is reflected at each stage of the review process: -when formulating review questions -synthesis, including investigation of heterogeneity -discussing results and making recommendations Searches: -3 review databases chosen on the basis of an epidemiological mapping exercise (Bayliss & Davenport 2008 IJTAHC; 24(4):403-411) ARIF, (University of Birmingham) DARE (CRD York) Cochrane Database Systematic Reviews -Cochrane Diagnostic Test Accuracy Working Group & the UK National Research Register: unpublished reviews

  17. Cochrane Database of Systematic Reviews Results: study flow ARIF DARE Duplicates N=303 Excluded at full paper stage: N=34 HITS: N=1215 • Excluded at abstract stage: • N=641 Eligible test accuracy reviews N=237 Random 100 reviews data extracted

  18. Results: review characteristics Date of publication ranged from 1990 to 2006; 23% of reviews before 2000 and 73% on or after 2000. A total of 16 disease topic areas were represented Between 1 and 50 index tests (median 3) were evaluated by a single review The majority of reviews (43/100) were conducted in the USA, 23 in the UK, 12 in the Netherlands and 8 in the rest of Europe, 6 in Australia, 4 in Canada, 2 in Peru and one each in Columbia and China. 94/100 reviews included a clinician as an author Using a modified checklist of 9 items taken from the QUORUM and AMSTAR checklists, study quality ranged from 0-9 (median 4.6; inter-quartile range 3 to 6)

  19. Results: question formulation • Only 24/100 reviews clearly specified all of test application, test role and prior tests as part of question formulation; 26% of the 73 reviews published on or after 2000 and 22% of the 23 reviews published before 2000

  20. Only 9/100 reviews reported all of setting, (pesentation (symptomatic or asymptomatic) and prior tests.

  21. Systematic review of evidence concerned with understanding and application of test accuracy metrics

  22. Contextualisation of review findings

  23. Characteristics of the literature Theoretical perspective papers: - 1978 to 2010, (88% after 1990) -34 papers written by 30 unique authors -25/30 clinicians of which 16/25 affiliated to an academic institution Empirical research: -1978 to 2010,(60% after 1995) -majority of health professional samples were self- selected, convenience samples from medical education courses - only 3/26 health professional samples representative of primary care

  24. Research Methods... Systematic review of evidence concerned with the understanding and application of test accuracy metrics. Survey of use and understanding of test accuracy metrics by general practitioners

  25. Total Hits N=16765 Results: study flow Excluded N=14136 Duplicates N=2508 Included papers N=67 Theoretical papers N=34 Empirical research N=33 Clinical sample N=26 Other N=7

  26. Results: Theoretical papers Understanding and application of test accuracy information: -Accessibility of test accuracy metrics -Knowledge about accuracy of tests used in practice likely to be limited -Lack of appreciation of variation in pre-test probability across healthcare settings -Use of graphical aids and frequencies (rather than % or proportions) to facilitate probability revision Factors affecting testing behaviour: -Testing viewed as a risk aversive behaviour -Testing context considered important modifier of attitudes to risk

  27. Theoretical: understanding and applying test accuracy information Global test accuracy measures do not distinguish between test accuracy in 2 dimensions Likelihood Ratios Sensitivity and specificity “A clinician will not start from diseased or not diseased, but from a positive or negative test. Therefore sensitivity and specificity are intuitively not so evident” (Dujardin 1994) “Never in 20 years of teaching clinical logic, have we found a clinician who used the word “positive likelihood ratio”. (Van Den Ende 2005) “The problem that occurs in a meta-analysis of diagnostic studies is the multi-directional performance of the diagnostic instrument regarding its ability to detect (specificity) or exclude (sensitivity) the characteristic of interest is not distinguished. Multi-dimensional outcomes cannot be summarised well by a single estimate.” (Stengel 2003)

  28. Test Accuracy Measures DOR = True Positives X True Negatives False Negatives X False Positives Test Result as Reference Class Positive Predictive Value TP / (TP+FP) Negative Predictive Value TN / (TN+FN) Disease as Reference Class Sensitivity TP / (TP+FN) Specificity TN / (TN+FP)

  29. Theoretical: understanding and applying test accuracy information Pre- test probability and test accuracy estimation Contextual variation in test accuracy “research has shown that clinicians’ estimates of probability vary widely and are often inaccurate..by itself, clinical experience appears insufficient to guide accurate probability estimation” (Richardson 2003) “We rarely know what the sensitivities, specificities or likelihood ratios are for tests. At best clinicians carry a general impression about their usefulness” (Gill 2005) • Unfortunately, it is often not realised that there can be no generally valid estimates of a test’s sensitivity, specificity or likelihood ratio that apply to all patients of a particular population, nor should such values be sought” (Moons 2003)

  30. Theoretical: factors affecting testing behaviour Clinicians are uncomfortable with uncertainty Testing ‘risk’ is context dependent “...some physicians order all the tests that may be even remotely applicable in a given clinical situation. Such a practice may comfort the patient and enhance the physician’s belief that all diagnostic avenues have been pursued, but more tests do not necessarily produce more certainty..... we continue to test excessively, partly because of our discomfort with uncertainty.” (Kassirer 1989) “.. feelings of uncertainty regarding medical problems can differ depending on the situation, not only because one physician may be faced with more complicated diagnostic puzzles than the other, but also, and primarily because the consequences of a vague and uncertain diagnosis may vary in each situation.” (Zaat (1992)

  31. Empirical research: understanding and applying test accuracy information High levels of error observed for estimates of the accuracy of particular tests. Estimates were based on clinical experience of test use, rather than published evidence. Between-person variation in estimation of disease prevalence (pre-test probability) for any one disease was considerable (25-100%) Confusion with interpretation of test accuracy metrics (for example sensitivity and specificity confused with positive and negative predictive values).

  32. Empirical research: understanding and applying test accuracy information Research based on premise that probability revision is necessary for diagnostic decision making... (32/33 empirical studies) investigated the ability of respondents to undertake probability revision Average proportion of respondents able to undertake probability revision 46%, range 0% - 33% for practising clinicians, 33%-73% academic clinicians Presentation of test accuracy as frequencies rather than proportions or percentages appears to facilitate probability revision …. However only 3% of respondents reported using probability revision in clinical practice

  33. Probability revision..... A serum test screens pregnant women for babies with Down’s syndrome. The test is a very good one but not perfect. Roughly 1% of babies have Down’s syndrome. If the baby has Down’s syndrome, there is a 90% chance that the result will be positive. If the baby is unaffected, there is still a 1% chance that the result will be positive. A pregnant woman has been tested and the result is positive. What is the chance that her baby actually has Down’s syndrome? (0.9) x (0.01)(probability of a TP) (0.9 x 0.01)+(0.01 x 0.99) (probability of a TP or a FP) Bramwell 2006; BMJ 333(7562):284-286

  34. Facilitation of probabilistic reasoning: frequencies Probabilistic representation Frequency representation 100 patients • The prevalence of disease is 10% (0.1).  • The probability of testing positive if you have disease is 80% (0.8) 10 with disease  90 no disease  • The probability of testing negative if you do not have disease is 89% (0.89)  8 test +ve 2 test -ve 10 test +ve 80 test -ve • The probability of testing positive even if you do not have disease is 11% (0.11) Probability of disease if test +ve (0.8) x (0.1) (0.8 x 0.1)+(0.11 x 0.9) Probability of disease if test +ve 8 18

  35. Survey of use and understanding of test accuracy metrics by General Practitioners “It would just confirm what we already know, doctors, on the whole, struggle with these concepts”

  36. Survey Objectives • To identify which sources of test accuracy information are used by primary care clinicians and barriers to their use • To evaluate the utility of existing test accuracy metrics as measured by self-reported familiarity, perceived ability to define metrics and self-reported use of metrics in clinical practice • To investigate whether there is consistency in the application of different test accuracy metrics and graphics across a common scenario

  37. SurveyMethods: distribution • Incentivised, electronic survey hosted by a professional network of ~200,000 GMC registered doctors with access to approximately 27 000 of 41 000 general practitioners across the UK (doctors.net.org ) • Sample size of 200 pre-specified

  38. Survey Results: respondent characteristics • 224/215 participants accessing the survey (95%) completed the survey in full • Number of years since qualification in the specialty ranged from 0-41 (median 14 years) • 11% had work responsibilities that might result in greater knowledge about test accuracy (GP trainer; GP with an academic position; GP involved in policy) • 13% of respondents had undertaken training that included test accuracy interpretation in the last 3 years.

  39. “Please estimate how often you use the following test accuracy information sources as part of your clinical work” Survey results: test accuracy information sources used by respondents

  40. Survey results: familiarity with test accuracy metrics/graphics

  41. Survey results: perceived ability to define metrics/graphics

  42. Use of test accuracy metrics in practice

  43. A new biological marker for ovarian cancer has been identified and is available as a blood test for use in primary care. A 57 year old asymptomatic woman presents to you concerned about her risk of ovarian cancer and you perform the blood test at her request.” • TEST ACCURACY INFORMATION PRESENTED IN ONE OF NINE DIFFERENT FORMATS • “If the test came back positive would you refer the woman for further investigation? • If the test came back negative would you be confident not to investigate further at this point in time?” Survey Methods: Application of nine different test accuracy metrics to a common testing scenario

  44. Survey Methods: Application of nine different test accuracy metrics to a common testing scenario • Sensitivity and Specificity • Sensitivity and Specificity (frequencies) • Predictive values • Predictive Values (frequencies) • Likelihood ratios • Pre to post test probability • Diagnostic Odds Ratio • Annotated 2x2 Diagnostic contingency table • Annotated pictogram

  45. Survey Methods: Application of nine different test accuracy metrics to a common testing scenario • Sensitivity and Specificity: “The marker has a sensitivity of 76% and a specificity of 98%” • Sensitivity and Specificity (frequencies): “Of every 100 women with ovarian cancer, 76 would test positive (be detected by the test) but 24 would test negative (be missed). Of every 100 women without ovarian cancer, 98 would test negative (receive a correct diagnosis) but 2 would test positive (be falsely labelled as having cancer).”

  46. Annotated 2x2 table

  47. Pictograph

  48. Survey RESULTS: Scenarios:“If the test came back POSITIVE would you refer the woman for further investigation?”

  49. Survey RESULTS: Scenarios:“If the test came back NEGATIVE would you be confident not to investigate further at this point in time?” YES – would not investigate further / would not refer NO – would investigate further / would refer

  50. Open responses to testing scenarios Obligation to test further: -“Would probably investigate (on the basis of a negative test result) but aware all further tests may be negative” -“I would refer -ve result here even - would be difficult to defend if subsequently turned out to have ovarian carcinoma.” -“Patient choice as well - but if she wanted further referral I would do this”

More Related