310 likes | 440 Views
Interpreting Diagnostic Tests. Ian McDowell Department of Epidemiology & Community Medicine January 2012. Note to readers: you may find the additional notes & explanations in the ppt notes panel helpful. The Challenge of Clinical Measurement.
E N D
Interpreting Diagnostic Tests Ian McDowell Department of Epidemiology & Community Medicine January 2012 Note to readers: you may find the additional notes & explanations in the ppt notes panel helpful.
The Challenge of Clinical Measurement • Diagnoses are based on information - from formal measurements and/or from clinical judgment. • This information is seldom perfectly accurate: • Random errors can occur (machine needs calibrating?) • Biases in judgment or measurement can occur (“this patient seems anxious: is he exaggerating?”) • Due to biological variability, this patient may not fit the general rule • Diagnosis (e.g., hypertension) involves a categorical judgment; this often requires dividing a continuous score (blood pressure) into categories. How to choose a cutting-point?
Therefore… • You need to be aware … • That we express these complexities in terms of probabilities • That using a quantitative approach is better than just guessing! • That you will gradually become familiar with the typical accuracy of measurements in your chosen clinical field • That the principles apply to both diagnostic and screening tests • Of how we describe the accuracy of a measurement.
Test characteristics • Reliability: consistency or reproducibility; this considers chance or random errors (which sometimes increase, sometimes decrease, scores). “Is it measuring something?” • Validity: “Is it measuring what it is supposed to measure?” By extension, “what diagnostic conclusion can I draw from a particular score on this test?” Validity may be affected by bias, which refers to systematic errors (these fall in a certain direction) • Safety, Acceptability, Cost, etc. 4
Reliability and Validity Reliability LowHigh Biasedresult! • • • • • • • • • Validity Low • • • • • • • ☺ High • • • • • • • Average of these inaccurate results is not bad. This is probably how screening questionnaires (e.g., for depression) work •
Ways of Assessing Validity • Content or “Face” validity: does it make clinical or biological sense? Does it include the relevant symptoms? • Criterion: comparison to a “gold standard” definitive measure (e.g., biopsy, autopsy) • Expressed as sensitivity and specificity • Construct validity (this is used with abstract themes, such as “quality of life” for which there is no definitive standard) 6
Criterion validation: “Gold Standard” The criterion that your clinical observation or simple test is judged against: • more definitive (but expensive or invasive) tests, such as a complete work-up, or • the clinical outcome (for screening tests, when workup of well patients is unethical). Sensitivity and specificity are calculatedfrom a research study that compares the test to a gold standard. 7
“2 x 2” table for validating a test Gold standard Disease DiseasePresent Absent Test score: Test positive Test negative a (TP) b (FP) c (FN) d (TN) • Validity: Sensitivity Specificity • = a/(a+c) = d/(b+d) • =TP/Diseased = TN/Healthy TP = true positive; FP = false positive… Golden Rule: always calculate based on the gold standard
Sensitivity = test’s ability to detect disease when it is present a/(a+c) = TP/(TP+FN) = TP/disease A sensitive person is one who can perceive your feelings (1 – seNsitivity) = false Negative rate: how many cases are missed by the test? • Specificity = precision of the test: identifies only that type of disease. “Nothing else looks like this” • A specific test generates few false positives. So, if the result is positive, the patient has this diagnosis. • (1- sPecificity) = false Positive rate: how many are falsely classified as having the disease?) 9
Test Errors • False Positives can arise due to other factors (diet; taking other medications, etc.) They entail the cost and danger of further investigations, labeling, worry for the patient. • This is similar to Type I or alpha error in a test of statistical significance (the possibility of falsely concluding that there is an effect of an intervention). • False Negatives imply missed cases, so potentially bad outcomes if untreated: an adverse event. • Cf. Type II or beta error: the chance of missing a true difference 10
Most Tests Provide a Continuous Score. Selecting a Cutting Point Test scores for a healthy population Sick population Healthyscores Pathologicalscores Possible cut-point Move this way to increase sensitivity(include more ofsick group) Move this way toincrease specificity(exclude healthy people) Crucial issue: changing cut-point can improve sensitivity or specificity, but never both
Improving the test: Healthy population Sick population Healthyscores Pathologicalscores Improved testreduces overlap,increasing sen & spec.
D + D - a b T + T - c d Clinical applications • A specific test can be useful to rule in a disease. Why? • Specific tests give few false positives.So, if the result is positive, you can be sure the patient has the condition (‘nothing else would give this result’): “SpPin” • A sensitive test can be useful for ruling a disease out: • A negative result on a very sensitive test (which detects all true cases) reassures you thatthe patient does not have the disease: “SnNout”
Your Patient’s Question:“Doctor, how likely am I to have this disease?”This introduces Predictive Values • Sensitivity & specificity don’t answer this, because they work from the gold standard. • The clinician sees the test result, but does not know whether this person is a true positive or a false positive (or a true or false negative). Hmmm… How accurately does a positive (or negative) test result predict disease (or health)?
Start from Prevalence • Before you apply any test, the best guide you have to a diagnosis is based on prevalence: • Common conditions (in this population) are the more likely diagnosis • Prevalence indicates the ‘pre-test probability’ of disease. You will then refine this informed guess in a series of stages: • First, consider the patient’s age and sex; use the prevalence for a similar person. • Then, based on the patient’s history you may modify the estimate.
D + D - a b T + T - c d Predictive Values • Based on rows, not columns • PPV = a/(a+b); interprets positive test: false positive rate • NPV = d/(c+d); interprets negative test: false negative rate • Immediately useful to clinician: they tell us about the test in this population and thus this patient • Vary with the prevalence of disease, so must be determined for each clinical setting • As prevalence goes down, PPV goes down and NPV rises
Prevalence and Predictive Values B. Primary care A. Specialist referral hospital D + D - D + D - 50 100 50 10 T + T - T + T - 5 1000 5 100 Sensitivity = 50/55 = 91% Specificity = 100/110 = 91% Prevalence = 55/165 = 33% Sensitivity = 50/55 = 91% Specificity = 1000/1100 = 91% Prevalence = 55/1155 = 3% PPV = 50/60 = 83% NPV = 100/105 = 95% PPV = 50/150 = 33% NPV = 1000/1005 = 99.5%
Exercise ECG (aka "treadmill test") • A 22 year old male with chest pain has a pretest probability of obstructive CAD of roughly 1%. • With a "positive" exercise ECG, his posttest probability is still less than 5%, in other words, there's a greater than 95% chance that he doesn’t have important CAD, despite a "positive" test. • The same applies in the opposite direction for a 72 year old male with typical anginal chest pain. Pretest probability is 95%; if the exercise ECG is negative, the posttest probability is still probably greater than 80%. • The overarching guideline is to treat the patient, not the test. To display the effects of changing cut-points and prevalence on predictive values, click here. (scroll down to the middle of the page)
From the literature you can getSensitivity & Specificity. To work out PPV and NPV for your practice, you need to guess prevalence, then work backwards: Fill cells in following order: “Truth” Disease Disease Total Predictive Present Absent Values Test Pos Test Neg Total 4th 5th 7th 6th 8th 9th 10th 11th 2nd 3rd 1st (from estimated prevalence) (from sensitivity) (from specificity)
D + D - TP FP T + T - FN TN Predictive Values • High specificity = few FPs: Sp = TN/(TN+FP).FPs also drive PPV: PPV = TP/(TP + FP);So, with a high PPV the clinician is more certain that a patient with a positive test has the disease (it rules in the disease) • The higher the sensitivity, the higher the NPV:Sn = TP/(TP+FN); NPV = TN/(TN+FN); the clinician can be more confident that a patient with a negative score does not have the diagnosis (because there are few false negatives). So, high NPV can rule out a disease.
a b c d N Gasp…! Isn’t there an easier way to do all this…? Yes (good!) But first, you need a couple more concepts (less good…) • We said that before you apply a test, prevalence gives your best guess about the chances that this patient has the disease. • This is known as “Pretest Probability of Disease”: (a+c) / N in the 2 x 2 table: • It can also be expressed as odds of disease: (a+c) / (b+d), as long as the disease is rare
This Leads to … Likelihood Ratios • Defined as the odds that a given level of a diagnostic test result would be expected in a patient with the disease, as opposed to a patient without: true positive rate / false positive rate [TP / FP] • Advantages: • Combines sensitivity and specificity into one number • Can be calculated for many cut-points on the test • Can be turned into predictive values • LR for positive test = Sensitivity / (1-Specificity) • LR for negative test = (1-Sensitivity) / Specificity
Practical application: a Nomogram • You need the LR for this test • Plot the likelihood ratio on center axis (e.g., LR+ = 20) 3) Select pretest probability(prevalence) on left axis (e.g. Prevalence = 30%) ▪ ▪ 4) Draw line through these points to right axis to indicate post-test probability of disease Example: Post-test probability = 91%
There is another way to combine sensitivity and specificity:Meet Receiver Operating Characteristic (ROC) curves 1 0.8 0.6 Sensitivity 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1-Specificity ( = false positives) Work out Sen and Spec for every possible cut-point, then plot these. Area under the curve indicates the information provided by the test In an ideal test, theblue line would reach the top leftcorner.For a useless test it would lie along the diagonal: nobetter than guessing
Chaining LRs Together (1) • Example: 45 year-old woman presents with “chest pain” • Based on her age, pretest probability that a vague chest pain indicates CAD is about 1% • Take a fuller history. She reports a 1-month history of intermittent chest pain, suggesting angina (substernal pain; radiating down arm; induced by effort; relieved by rest…) • LR of this history for angina is about 100
The previous example: 1. From the History: She’s young;pretest probabilityabout 1% Pretest probabilityrises to 50%based on history LR 100
Chaining LRs Together (2) 45 year-old woman with 1-month history of intermittent chest pain… After the history, post test probability is now about 50%. What will you do?A more precise (but also more costly) test: • Record an ECG • Results = 2.2 mm ST-segment depression. LR for ECG 2.2 mm result = 10. • This raises post test probability to > 90% for coronary artery disease (see next slide)
The previous example: ECG Results Post-test probabilitynow rises to 90% Now start pretest probability (i.e. 50%, prior to ECG, based onhistory)