300 likes | 468 Views
Interpreting Diagnostic Tests. Ian McDowell Department of Epidemiology & Community Medicine January 2010. Note to users: you may find the additional notes & explanations in the ppt notes panel helpful. . Objectives. To understand sources of error in typical measurements
E N D
Interpreting Diagnostic Tests Ian McDowell Department of Epidemiology & Community Medicine January 2010 Note to users: you may find the additional notes & explanations in the ppt notes panel helpful.
Objectives • To understand sources of error in typical measurements • To understand sensitivity, specificity • To explain the implications of false positives and false negatives • To understand predictive values, • And Likelihood ratios
Road map to date This session considers the interpretation of diagnostic tests, a daily issue in clinical practice. It builds on some of the ideasintroduced last term: Measurements: validity, bias determinants of bias Applying conclusions from a study sample to an individual patient Contrasts between researchon hospital patientsand community practice Evidence-based practice
The Challenge of Clinical Measurement • Diagnoses are based on information, from formal measurements and/or from your clinical judgment. • This information is seldom perfectly accurate: • Random errors can occur (machine not working?) • Biases in judgment or measurement can occur (“this kid doesn’t look sick”) • Due to biological variability, this patient may not fit the general rule • Diagnosis (e.g., hypertension) involves a categorical judgment; this often requires dividing a continuous score (blood pressure) into categories. Choosing the cutting-point is challenging.
Therefore… • You need to be aware … • Diagnostic judgments are based on probabilities; • That using a quantitative approach is better than just guessing! • That you will gradually become familiar with the typical accuracy of measurements in your chosen clinical field; • That the principles apply to both diagnostic and screening tests; • Of some of the ways to describe the accuracy of a measurement.
Why choose one test and not another? • Reliability: consistency or reproducibility; this considers chance or random errors (which sometimes increase, sometimes decrease, scores). “Is it measuring something?” • Validity: “Is it measuring what it is supposed to measure?” By extension, “what diagnostic conclusion can I draw from a particular score on this test?” Validity may be affected by bias, which refers to systematic errors (these fall in a certain direction) • Safety, Acceptability, Cost, etc. 6
Reliability and Validity Reliability LowHigh Biasedresult! • • • • • • • • • Validity Low • • • • • • • ☺ High • • • • • • • Average of these inaccurate results is not bad. This is probably how screening questionnaires (e.g., for depression) work •
Ways of Assessing Validity • Content or “Face” validity: does it make clinical or biological sense? Does it include the relevant symptoms? • Criterion: comparison to a “gold standard” definitive measure (e.g., biopsy, autopsy) • Expressed as sensitivity and specificity • Construct validity (this is used with abstract themes, such as “quality of life” for which there is no definitive standard) 8
Criterion validation: “Gold Standard” The criterion that your clinical observation or simple test is judged against: • more definitive (but expensive or invasive) tests, such as a complete work-up, or • the clinical outcome (for screening tests, when workup of well patients is unethical). Sensitivity and specificity are calculatedfrom a research study comparing the test to a gold standard. 9
“2 x 2” table for validating a test Gold standard Disease DiseasePresent Absent Test score: Test positive Test negative a (TP) b (FP) c (FN) d (TN) • Validity: Sensitivity Specificity • = a/(a+c) = d/(b+d) • =TP/Diseased = TN/Healthy TP = true positive; FP = false positive… Golden Rule: always calculate based on the gold standard
A Bit More on Sensitivity = Test’s ability to detect disease when it is present a/(a+c) = TP/(TP+FN) = TP/disease Mnemonics: - a sensitive person is one who is aware of your feelings- (1 – seNsitivity) = false Negative rate = how many cases are missed by the screening test? 11
…and More on Specificity Precision of the test • a specific test would identify only that type of disease. “Nothing else looks like this” • a highly specific test generates few false positives. So, • If the result is positive, you can be confident the patient has this diagnosis. • Mnemonics: (1- sPecificity) = false Positive rate (How many are falsely classified as having the disease?) 12
Problems Resulting from Test Errors • False Positives can arise due to other factors (such as taking other medications, diet, etc.) They entail the cost and danger of further investigations, labeling, worry for the patient. • This is similar to Type I or alpha error in a test of statistical significance (the possibility of falsely concluding that there is an effect of an intervention). • False Negatives imply missed cases, so potentially bad outcomes if untreated • Cf. Type II or beta error: the chance of missing a true difference 13
Most Tests Provide a Continuous Score. Selecting a Cutting Point Test scores for a healthy population Sick population Healthyscores Pathologicalscores Possible cut-point Move this way to increase sensitivity(include more ofsick group) Move this way toincrease specificity(exclude healthy people) Crucial issue: changing cut-point can improve sensitivity or specificity, but never both
D + D - a b T + T - c d Clinical applications • A specific test can be useful to rule in a disease. Why? • Very specific tests give few false positives.So, if the result is positive, you can be sure the patient has the condition (‘nothing else would give this result’): “SpPin” • A sensitive test can be useful for ruling a disease out: • A negative result on a very sensitive test (which detects all true cases) reassures you thatthe patient does not have the disease: “SnNout”
Your Patient’s Question:“Doctor, how likely am I to have this disease?”This introduces Predictive Values • Sensitivity & specificity don’t answer this, because they work from the gold standard. • Now you need to work from the test result, but you won’t know whether this person is a true positive or a false positive (or a true or false negative). Hmmm… How accurately does a positive (or negative) result predict disease (or health)?
Start from Prevalence • Before you do any test, the best guide you have to a diagnosis is based on prevalence: • Common conditions (in this population) are the more likely diagnosis • Prevalence indicates the ‘pre-test probability of disease’
D + D – a b T + T – c d Positive and Negative Predictive Values • Based on rows, not columns • Positive Predictive Value (PPV) = a/(a+b) = Probability that a positive score is a true positive • NPV = d/(c+d); same for a negative test result • BUT… there’s a big catch: • We are now working across the columns, so PPV & NPV depend on how many cases of disease there are (prevalence). • As prevalence goes down, PPV goes down (it’s harder to find the smaller number of cases) and NPV rises. • So, PPV and NPV must be determined for each clinical setting, • But they are immediately useful to clinician: they reflect this population, so tell us about thispatient
Prevalence and Predictive Values B. Primary care A. Specialist referral hospital D + D - D + D - 50 100 50 10 T + T - T + T - 5 1000 5 100 Sensitivity = 50/55 = 91% Specificity = 100/110 = 91% Prevalence = 55/165 = 33% Sensitivity = 50/55 = 91% Specificity = 1000/1100 = 91% Prevalence = 55/1155 = 3% PPV = 50/60 = 83% NPV = 100/105 = 95% PPV = 50/150 = 33% NPV = 1000/1005 = 99.5%
Predictive Values • High specificity = few FPs: Sp = TN/(TN+FP);FPs also drive PPV: PPV = TP/(TP + FP);So, the clinician is more certain that a patient with a positive test has the disease (it rules in the disease) • The higher the sensitivity, the higher the NPV:Sn = TP/(TP+FN); NPV = TN/(TN+FN); the clinician can be more confident that a patient with a negative score does not have the diagnosis (because there are few false negatives). So, high NPV can rule out a disease.
From the literature you can getSensitivity & Specificity. To work out PPV and NPV for your practice, you need to guess prevalence, then work backwards: Fill cells in following order: “Truth” Disease Disease Total Predictive Present Absent Values Test Pos Test Neg Total 4th 5th 7th 6th 8th 9th 10th 11th 2nd 3rd 1st (from estimated prevalence) (from sensitivity) (from specificity)
a b c d N Gasp…! Isn’t there an easier way to do all this…? Yes (good!) But first, you need a couple more concepts (less good…) • We said that before you apply a test, prevalence gives your best guess about the chances that this patient has the disease. • This is known as “Pretest Probability of Disease”: (a+c) / N in the 2 x 2 table: • It can also be expressed as odds of disease: (a+c) / (b+d), as long as the disease is rare
This Leads to … Likelihood Ratios • Defined as the odds that a given level of a diagnostic test result would be expected in a patient with the disease, as opposed to a patient without: true positive rate / false positive rate [TP / FP] • Advantages: • Combines sensitivity and specificity into one number • Can be calculated for many levels of the test • Can be turned into predictive values • LR for positive test = Sensitivity / (1-Specificity) • LR for negative test = (1-Sensitivity) / Specificity
Practical application: a Nomogram • You need the LR for this test • Plot the likelihood ratio on center axis (e.g., LR+ = 20) 3) Select pretest probability(prevalence) on left axis (e.g. Prevalence = 30%) ▪ ▪ 4) Draw line through these points to right axis to indicate post-test probability of disease Example: Post-test probability = 91%
There is another way to combine sensitivity and specificity:Meet Receiver Operating Characteristic (ROC) curves 1 0.8 0.6 Sensitivity 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1-Specificity ( = false positives) Work out Sen and Spec for every possible cut-point, then plot these. Area under the curve indicates the information provided by the test In an ideal test, theblue line would reach the top leftcorner.For a useless test it would lie along the diagonal: nobetter than guessing
Chaining LRs Together (1) • Example: 45 year-old woman presents with “chest pain” • Based on her age, pretest probability that a vague chest pain indicates CAD is about 1% • Take a fuller history. She reports a 1-month history of intermittent chest pain, suggesting angina (substernal pain; radiating down arm; induced by effort; relieved by rest…) • LR of this history for angina is about 100
The previous example: 1. From the History: She’s young;pretest probabilityabout 1% Pretest probabilityrises to 50%based on history LR 100
Chaining LRs Together (2) 45 year-old woman with 1-month history of intermittent chest pain… After the history, post test probability is now about 50%. What will you do?A more precise (but also more costly) test: • Record an ECG • Results = 2.2 mm ST-segment depression. LR for ECG 2.2 mm result = 10. • This raises post test probability to > 90% for coronary artery disease (see next slide)
The previous example: ECG Results Post-test probabilitynow rises to 90% Now start pretest probability (i.e. 50%, prior to ECG, based onhistory)