480 likes | 703 Views
Statistical Aspects of Diagnostic Tests R. M. Pandey. Appearance to the mind are of four kinds : 1. Things either are what they appear to be ( TP) 2. Or they neither are, nor appear to be; ( TN ) 3. Or they are, and do not appear to be; ( FN )
E N D
Statistical Aspects of Diagnostic Tests R. M. Pandey
Appearance to the mind are of four kinds : 1. Things either are what they appear to be (TP) 2. Or they neither are, nor appear to be; (TN) 3. Or they are, and do not appear to be; (FN) • Or they are not, yet appear to be; (FP) Rightly to aim in all these cases is the wise man’s task – Epicteus, 2nd Century AD (1) • Diagnostic tests: To predict the disease (condition) • Prognostic tests: To predict outcome of a Disease(condition) SIMPLIFYING DATA : • Clinical Measurements : nominal, ordinal, interval scales THE ACCURACY OF A TEST RESULT : •Establishing diagnosis : Imperfect process – probability than certainty
The “Gold Standard” : • What is a Gold Standard ? • Tissue daig, radiological contrast procedures, prolonged follow up, autopsies • Almost always more costly, less feasible • Lack of objective standards of disease(e.g. angina Pectoris:Gold standard is careful history taking) • Consequences of imperfect standards
Diagnostic Characteristics Not a hypothesis testing situation BUT • How well does the test identify patients with a disease? • How well does the test identify patients without a disease?
Evaluation of the Diagnostic Test • Give a group of people (with and without the disease) both tests (the candidate test and the “gold standard” test and then cross-classify the results and report the diagnostic characteristics of the test.
Spectrum of patients • Bias (e.g. X-rays) • Chance (compute adequate sample size as in prevalence studies: separately for cases and non-cases) n= 4 p (100-p)/ d2 PREDICTIVE VALUE • Definitions • Determinants of predictive Value
Probability • The concept: If a trial (or experiment) is independently repeated a large number of times (N) and an outcome (A) occurs n times, then: • P(A) = n/N • Interpretation: if the trial is repeated again in the future, the likelihood that A will be the outcome again is n/N.
Diagnostic Characteristics Sensitivity: The probability that a diseased individual will be identified as such by the test = P(T+ / D+) = a/(a+c) Specificity: The probability that an individual without the disease will be identified as such by the test = P(T- / D-) = d/(b+d) • A perfect test would have b and c equal to 0.
Diagnostic Characteristics • False positive rate = P(T+ / D-) = b/(b+d) = 1 – Specificity • False negative rate = P(T- / D+) = c/(a+c) = 1 – Sensitivity
Predictive Values of Diagnostic Tests • More informative from the patient or physician perspective • Special applications of Bayes Theorem
Predictive Values of Diagnostic Tests • Positive Predictive Value = P(D+ / T+) = a/(a+b) if the prevalence of disease in the general population is the same as the prevalence of disease in the study
Predictive Values of Diagnostic Tests • Negative Predictive Value = P(D- / T-) = d/(c+d) if the prevalence of disease in the general population is the same as the prevalence of disease in the study
Example: A researcher develops a new saliva pregnancy test. She collects samples from 100 women known to be pregnant by blood test (the gold standard) and 100 women known not be pregnant, also based on the same blood test. The saliva test is “positive” in 95 of the pregnant women. It is also “positive” in 15 of the 100 non-pregnant women. What are the sensitivity and specificity?
Test Gold standard Pregnant Non-pregnant Totals Saliva + 95 15 110 Saliva - 5 85 90 Totals 100 100 200 Sensitivity = TP/(TP+FN) = 95/100 = 95% Specificity = TN/(TN+FP) = 85/100 = 85%
Is it more important that a test be sensitive or specific? • It depends on its purpose. A cheap mass screening test should be sensitive (few cases missed). A test designed to confirm the presence of disease should be specific (few cases wrongly diagnosed). • Note that sensitivity and specificity are two distinct properties. Where classification is based on an cut point along a continuum, there is a tradeoff between the two.
Example: The saliva pregnancy test detects progesterone (a pregnancy-related hormone). A refined version is developed. Suppose you add a drop of indicator solution to the saliva sample. It can stay clear (0 reaction) or turn green (1+), red (2+), or black (3+).
The researcher conducts a validation study and finds the following: PregnantNon-pregnantTotals Saliva 3+ 85 5 90 Saliva 2+ 10 10 20 Saliva 1+ 3 17 20 Saliva 0 2 68 70 Totals 100 100 200
The sensitivity and specificity of the saliva test will depend on the definition of “positive” and “negative” used. • If “positive” 1+, sensitivity = (85+10+3)/100 = 98% specificity = 68/100 = 68% • If “positive” 2+, sensitivity = (85+10)/100 = 95% specificity = (68+17)/100 = 85% • If “positive” = 3+, sensitivity = 85/100 = 85% specificity = (68+17+10)/100 = 95%
The choice of cutpoint depends on the relative adverse consequences of false-negatives vs. false-positives. If it is most important not to miss anyone, use sensitivity and specificity. If it is most important that people not be erroneously labeled as having the condition, use sensitivity and specificity.
Key points: • The positive and negative predictive values depend on the pretest probability of the condition of interest - in addition to the sensitivity and specificity of the test. • This pretest probability is often the prevalence of the condition in the population of interest. • But it can also reflect restriction of this population based on clinical features and/or other test results. • For example, the pretest probability of pregnancy will be very different among young women using oral contraceptives from that among sexually active young women using no form of contraception.
Example: The saliva pregnancy test is administered 30 days after the first day of the last menstrual period to two groups of women who have thus far “missed” a period. Group 1: 1000 sexually active young women using no contraception. Pretest probability of pregnancy 40% (hypothetical) Based on sensitivity of 95%, expected TP = 400 x 0.95 = 380 expected FN = 400-380 = 20 Based on specificity of 85%, expected TN = 600 x 0.85 = 510 expected FP = 600-510 = 90 Pregnant Non-pregnant Totals Test + 380 90 470 Test - 20 510 530 Totals 400 600 1000
Positive predictive value = TP = 380/470 = 81% TP+FP In this context, a woman with a positive saliva test has an 81% chance of being pregnant. Negative predictive value = TN = 510/530 = 96% TN+FN In this context, a woman with a negative saliva test has a 96% chance of not being pregnant (and a 4% chance of being pregnant)
Group 2: 1000 oral contraceptive users - pretest probability of pregnancy = 10% (hypothetical) Pregnant Non-pregnant Totals Test + 95 135 230 Test - 5 765 770 Totals 100 900 1000 Using sensitivity = 95%, expected TP = 0.95 x 100 = 95 expected FN = 100-95 = 5 Using specificity = 85%, expected TN = 0.85 x 900 = 765 expected FP = 900-765 = 135 In this context, positive predictive value is only 95/230 = 41% [TP/(TP+FP)] Negative predictive value is [TN/(TN+FN)]= 765/770 = 99%
In which situation is the saliva test more helpful? Group 1: PPV: 81% probability of pregnancy (Pretest probability 40%) NPV: 96% probability of pregnancy Group 2: PPV: 41% probability (Pretest probability 10%) NPV: 99% probability
Note that the same test would likely be used and interpreted very differently in these two contexts. • This does not imply any difference in the characteristics of the test itself, i.e. sensitivity and specificity are not altered by the pretest probability of the condition of interest. • Test are most useful when the pretest probability is in a middle range. They are unlikely to be useful when the pretest probability is already very high or low.
Likelihood Ratio LR’s are an alternate way of describing the performance of a diagnostic test Generic formula for LR* : Probability of test result* in people with disease _________________________________________ Probability of test result* in people without disease Or : Probability of test result * in disease of interest ----------------------------------------------------------------- Probability of test result * in other disease (s)
Probability & Odds D+ = Disease D- = no Disease Probability of disease = D+ /(D+ + D-) Odds of Disease = D+ / D- Ex. P = 0.3 = 0.3/(0.3+0.7) Odds = 0.3/0.7=0.4 D+ D¯
Odds and Probability ProbabilityOdds 50% 1/3 0.2
Uses of Likelihood Ratio Post test Odds = Pre test X Likelihood Ratio
Likelihood Ratios : only two outcomes for a test) When only two outcomes for a test : + or - , LR can be calculated from test’s sensitivity and specificity : LR + = Probability of + test in people with disease ------------------------------------------------------------------ Probability of + test in people without disease = Sensitivity/(1- specificity) LR - = Probability of - test in people with disease ---------------------------------------------------------------------- Probability of - test in people without disease = (1- sensitivity) / specificity
Likelihood Rations : Advantages ____________________________ • Sensitivity and specificity are concepts that often waste information • Likelihood ratios best describe the value of diagnostic tests. • Likelihood ratios are best for combinations of tests : • easier to calculate for combinations of independent tests. (b) better estimates of probability when tests are not independent (LR for each combination of test results.)
Limitations and strategies: • Studies on diagnostic tests are susceptible to errors due to chance and bias • Design phase strategies should deal with these errors • Random errors (compute 95% CI) • Systematic errors • Sampling bias • Measurement bias • Reporting bias
Sampling Bias • A.Selection of cases and non-cases • Selection of cases from referral center leads to over estimation of sensitivity • Selection of non-cases from volunteers leads to over estimation of specificity • Therefore, Study sample should be representative of the target population in which test would eventually be used • B. Prevalence(prior probability) of disease in the patients being studied • Higher the prevalence would lead to over estimation of predictive values • Therefore, report predictive values at various prevalences
Measurement Bias • Non blinding of raters • Borderline or unsatisfactory results • Reporting Bias(Publication bias)
Steps in Planning a Study of a Diagnostic Test • Determine whether there is a need for a new diagnostic test • Describe the way in which the subjects will be selected • Should have a reasonable Gold Standard • Ensure application of Gold standard and diagnostic test in standardized and blinded manner • Should estimate sample size required 95% CI for sensitivity & specificity • Find a sufficient number of willing subjects to satisfy ‘n’ and sampling criteria • Finally report the results in terms of sensitivity, specificity as well as PPV, NPV at different prior probability of disease.
Diagnostic Tests and Screening Readings: • Jean Bourbeau, Dick Menzies, Kevin Schwartzman, Epidemiology and Biostatistics 679: Clinical Epidemiology May 31-June 25, 2004,Respiratory Epidemiology and Clinical Research Unit Montreal Chest Institute K1 3650 St. Urbain • Fletcher, chapters 1 (Introduction), 3 (Diagnosis), 8 (Prevention) • Barry MJ, Prostate-specific antigen testing for early diagnosis of prostate cancer, N Engl J Med 2001; 344:1373-1377 [Clinical Practice] • Hamm CW et al, Emergency room triage of patients with acute chest pain by means of rapid testing for cardiac troponin T or troponin I, N Engl J Med 1997; 337:1648-53 • Scott Evans,Evaluation of Screening and Diagnostic Tests, Introduction to Biostatistics, Harvard Extension School, Fall, 2004