650 likes | 1.04k Views
EPI 5240: Introduction to Epidemiology Screening and diagnostic test evaluation November 2, 2009. Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa. Session Overview. Review key features of tests for disease. Diagnostic test evaluation Study designs
E N D
EPI 5240:Introduction to EpidemiologyScreening and diagnostic test evaluationNovember 2, 2009 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa
Session Overview • Review key features of tests for disease. • Diagnostic test evaluation • Study designs • Key biases • Screening programmes • Overview • Criteria for utility • Issues in evaluation and implementation • Regression to the mean
Scenario (1) A 54 year old female teacher visited her FP for an ‘annual checkup’. She reported no illnesses in the previous year, felt well and had no complaints. Hot flashes related to menopause had resolved. A detailed physical examination, included breast palpation, was unremarkable. A screening mammogram was recommended as per current guidelines.
Scenario (2) The mammogram results were ‘not normal’ and a follow-up breast biopsy was recommended. The surgeon confirmed the negative clinical exam. But, based on the abnormal mammogram, a fine-needle aspiration biopsy of the abnormal breast under radiological guidance was recommended. Pathological review of the biopsy revealed the presence of a malignant breast tumor. Further surgery was scheduled to pursue this abnormal finding.
FNA positive risk • 100% vs 64% • Depends on definition of a ‘positive’ FNA. • Must be clear carcinoma • 100% positive (0% false positives) • Abnormal cells, may not be cancer • 64% positive (36% false positive) • Why use second approach? • Reduces the risk that you will miss someone who has a true cancer • Tradeoff of sensitivity and specificity • More later
Test Properties (1) • Most common situation (for teaching at least) assumes: • Dichotomous outcome (ill/not ill) • Dichotomous test results (positive/negative) • Represented as a 2x2 table (yet another variant!). • Advanced methods can consider tests with multiple outcomes • advanced; moderate; minimal; no disease
Test Properties (2) True positives False positives False negatives True negatives
Test Properties (4) Sensitivity = 0.90 Specificity = 0.95
Test Properties (5) Sensitivity Specificity
Test Properties (6) • Sensitivity = Pr(test positive in a person with disease) • Specificity = Pr(test negative in a person without disease) • Range: 0 to 1 • > 0.9: Excellent • 0.8-0.9: Not bad • 0.7-0.8: So-so • < 0.7: Poor
Test Properties (7) • Generally, high sensitivity associated with low specificity and vice-versa (more later). • Do you want a test with high sensitivity or specificity? • Depends on cost of ‘false positive’ and ‘false negative’ cases. • PKU – one false negative is a disaster. • Ottawa Ankle Rules
Test Properties (8) • Patients don’t ask: if I’ve got the disease how likely is it that the test will be positive? • They ask: My test is positive? Does that mean I have the disease? • Predictive values.
Test Properties (9) PPV = 0.95 NPV = 0.90
Test Properties (10) PPV NPV
Test Properties (11) • PPV = Pr(subject has disease given that their test was positive) • NPV = Pr(subject doesn’t have disease given that their test was negative) • Range: 0 to 1 • PPV is affected by the prevalence of the disease in the target population. Sensitivity & specificity are not affected by prevalence. • To use test in new population, you need to ‘calibrate’ the PPV/NPV. • Example: sens = 0.85; spec = 0.9
Test Properties (12) Tertiary care: research study. Prevalence=0.5 PPV = 0.89
Test Properties (13)Calibration by hypothetical table Fill cells in following order: “Truth” Disease Disease Total PV Present Absent Test Pos 4th 7th 8th 10th Test Neg 5th 6th 9th 11th Total 2nd 3rd 1st (10,000)
Test Properties (14) Primary care: Prevalence=0.01 1,075 85 990 0.85*100 PPV = 0.08 15 8,910 8,925 0.9*9900 9,900 100 0.01*10000
Test Properties (16)Likelihood Ratio Post-test odds Pre-test odds post-test odds LR+ve = ----------------------- pre-test odds
Test Properties (15)Likelihood ratio Post-test odds = 18.0 Pre-test odds = 1.00 Likelihood ratio (+ve) = LR(+) = 18.0/1.0 = 18.0
Test Properties (17) • LR(+ve) gives the amount by which the odds of disease increase if the test is positive. • Big values are good. Need at least 8-10 to have an acceptable test. a * (b+d) sensitivity LR(+ve) = ------------------- = --------------------- (a+c) * b (1 – specificity) • LR(+ve) is not affected by disease prevalence. • Can be used to adjust PPV/NPV for differences in prevalence.
Test Properties (18) • Adjusting PPV/NPV using LR(+ve) • Compute LR (+ve) from your test sample (LRtest) • Convert the new disease prevalence into odds (pre-test odds): • pre-test odds = p/(1-p) • Multiply pre-test odds by LRtest to give post-test odds (oddspost) • Convert oddspost to PPV: • PPV = oddspost/(1 + oddspost)
Test Properties (19)PPV via LR(+ve) • Previous example • Prevalence = 1%; sens = 85%; spec = 90% • Pretest odds = .01/.99 = 0.0101 • LR+ = .85/.1 = 8.5 (>1, but not that great) • Post-test odds (+ve) = .0101*8.5 = .0859 • PPV = .0859/1.0859 = 0.079 = 7.9% • Compare to the ‘hypothetical table’ method (PPV=8%)
Test Properties (20) • Most tests give continuous readings • Serum hemoglobin • PSA • X-rays • How to determine ‘cut-point’ for normal vs diseased (negative vs positive)? • ↑ sensitivity ↓specificity • Receiver Operating Characteristic (ROC) curves
Negative Positive False -ve False +ve
Negative Positive False +ve False -ve
Diagnostic test study issues (1) • How do you select the subjects for a study to evaluate the properties of a diagnostic test? • Most test evaluations are done in tertiary care settings PPV/NPV issues. • Three main methods of choosing subjects: • Take ‘all comers’ • Select a group of people with disease and a group without disease • Select a group who are test positive and a group who are test negative.
Diagnostic test study issues (3) • Method 1: • Inefficient – most people won’t have disease. • Method 2: • Hard to implement if test must be administered before outcome is known (e.g. a measure of reactive arterial narrowing and diagnosis of a heart attack) • Method 3: • Gives biased estimates of sensitivity/specificity (Work-up Bias)
Diagnostic test study issues (4) • Spectrum Bias • It’s easy to diagnose a broken leg in a person with a compound fracture. • It’s much harder to distinguish someone with a hairline fracture from a person with a deep bruise or ligament injury. • Study must include subjects with the relevant spectrum of disease states. • Spectrum needed depends on purpose of the test.
Diagnostic test study issues (5) • Work-up bias • The study selects patients based on the result of the diagnostic test (e.g. 100 test +ve and 100 test –ve). • sens/spec will be biased. • Example: • Evaluate a new method to screen men with chest pain. It’s hard to get men with known CHD (can’t be done in ED alone). You might try to select men based on results of the screening test.
Work-up Bias TRUE TEST PERFORMANCE Sensitivity = 150/250 = 60% Specificity = 900/950 = 95% NOW, suppose we only studied 100 people with a negative test but everyone with a positive test?
Work-up Bias (2) TEST PERFORMANCE FROM STUDY 0.1 * 1000 90 100 10 0.1 * 100 160 140 300 Sensitivity = 150/160 = 94% not 60% BIAS! Specificity = 90/140 = 64% not 95%
Screening (1) • Screening • The presumptive identification of an unrecognized disease or defect by the application of tests, examinations or other procedures • Can be applied to an unselected population or to a high risk group. • Examples • Pap smears (cervical cancer) • Mammography (breast cancer) • Early childhood development • PKU
Screening (2) • Levels of prevention: • Primary prevention • Secondary prevention • Tertiary prevention
Screening (3) DPCP§ § Detectable Pre-Clinical Phase
Screening (5) Criteria to determine if a screening programme should be implemented • Disease Factors • Severity • Presence of a lengthy DPCP • Evidence that earlier treatment improves prognosis
Screening (6) • Test Factors • Valid - sensitive and specific with respect to DPCP • Reliable and reproducible (omitted from most lists, but shouldn't be) • Acceptable - cf. sigmoidoscopy • Easy • Cheap • Safe
Screening (7) • Test Factors (cont) • Test must reach high-risk groups - cf Pap smears • Sequential vs parallel tests • Sequential higher specificity • Parallel higher sensitivity • System Factors • Follow-up provided and available to all • Treatment resources adequate
Screening (8) • Evaluation of Screening • Can it work? • Does it work in the real world? • Case-control vs. cohort vs. RCT • Are we evaluating • Screening alone • Mammography and breast cancer detection • Screening plus therapy • Mammography and survival
Screening (9) • Biases in interpreting evaluations of screening programmes. • Lead-time Bias • Detecting disease early gives more years of ‘illness’ but doesn’t prolong life • Length Bias • Slowly progressive cases are more likely to be detected than rapidly progressive cases
Screening (12) • Study proposes to evaluate a screening programme in an RCT by comparing survival (adjusted for lead time bias) in people who were screened to those who were not screened. • Will give a biased estimate of effectiveness (screening will look ‘too good’).