880 likes | 1.12k Views
Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules. Michael A. Kohn, MD, MPP 10/30/2008. Outline of Topics. Prognostic Tests Differences from diagnostic tests Quantifying prediction: calibration and discrimination Comparing predictions
E N D
Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules Michael A. Kohn, MD, MPP 10/30/2008
Outline of Topics • Prognostic Tests • Differences from diagnostic tests • Quantifying prediction: calibration and discrimination • Comparing predictions • Value of prognostic information • Combining Tests/Diagnostic Models • Importance of test non-independence • Recursive Partitioning • Logistic Regression • Variable (Test) Selection • Importance of validation separate from derivation
Prognostic Tests Differences from diagnostic tests Validation/Quantifying Accuracy (calibration and discrimination) Comparing predictions by different people or different models Assessing the value of prognostic information
Diagnostic tests are for prevalent disease; prognostic tests are for incident outcomes. Studies of prognostic tests have a longitudinal rather than cross-sectional time dimension.* (Fix a future time point and determine whether the dichotomous outcome has occurred at that point, e.g., death or recurrence at 5 years.) Prognostic test “result” is often a probability of having the outcome by the future time point (e.g. risk of death or recurrence by 5 years). Difference from Diagnostic Tests *But studies of diagnostic tests that use clinical follow-up as a gold standard also are longitudinal.
Problems with estimating risk of outcome by a fixed future time point • Equates all outcomes prior to the time point and all outcomes after the time point. (Death at 1 month is the same as death at 4 years and 11 months; 5-year-1-month survival is the same as > 10-year survival). • Cannot analyze subjects lost to follow-up prior to the time point. Time-to-event analysis (proportional hazards) often important/necessary, but it’s covered elsewhere in your curriculum.
Predicting Continuous Outcomes • Time to death/recurrence • Birth weight • Weight loss/gain
Predicting Continuous Outcomes Glare, P., K. Virik, et al. (2003). "A systematic review of physicians' survival predictions in terminally ill cancer patients." Bmj327(7408): 195-8.
Predicting Continuous Outcomes Can calculate Outcomeactual - Outcomepredictedfor each individual.* Summarize with mean and SD of individual differences. Plot individual differences vs. actual outcome. Looks like a Bland-Altman plot. (And that’s all I’m going to say about predicting continuous outcomes.) *This does not make sense for dichotomous outcomes.
Prognostic Tests and Multivariable Diagnostic Models Commonly express results in terms of a probability -- risk of the outcome by a fixed time point (prognostic test) -- posterior probability of disease (diagnostic model) Need to assess both calibration and discrimination.
Example* Oncologists estimated the probability of “cure” (5-year disease-free survival) in each of 96 cancer patients. After 5 years, 70 (of the 96) died or had recurrence, and 26 (27%) were “cured.” *Mackillop, W. J. and C. F. Quirt (1997). "Measuring the accuracy of prognostic judgments in oncology." J Clin Epidemiol50(1): 21-9.
How accurate are the predicted probabilities? Break the population into groups Compare actual and predicted probabilities for each group Calibration* *Related to Goodness-of-Fit and diagnostic model validation, which will be discussed shortly.
Calibration Mackillop, W. J. and C. F. Quirt (1997). "Measuring the accuracy of prognostic judgments in oncology." J Clin Epidemiol50(1): 21-9.
How well can the test separate subjects in the population from the mean probability to values closer to zero or 1? May be more generalizable Often measured with C-statistic (AUROC) Discrimination
Discrimination Mackillop, W. J. and C. F. Quirt (1997). "Measuring the accuracy of prognostic judgments in oncology." J Clin Epidemiol50(1): 21-9.
Discrimination Mackillop, W. J. and C. F. Quirt (1997). "Measuring the accuracy of prognostic judgments in oncology." J Clin Epidemiol50(1): 21-9.
Calibration vs. Discrimination • Perfect calibration, no discrimination: • Oncologist assigned 27% probability of cure to each of the 96 patients. • Perfect discrimination, poor calibration • Mean* of oncologist-assigned “cure” probabilities was 50%, but every patient who died or had a recurrence was assigned a cure probability ≤ 40% and every patient who survived was assigned a probability ≥ 60%. * ∑pini / N (It was actually 30% in the study.)
Discrimination 60% 40% 80% 100%
Comparing Predictions • Compare ROC Curves and AUROCs • Reclassification Tables*, Net Reclassification Improvement (NRI), Integrated Discrimination Improvement (IDI) • See Jan. 2008 Issue of Statistics in Medicine** (? and EBD Edition 2 ?) *Problem 8-1 has a reclassification table. **Pencina et al. Stat Med. 2008 Jan 30;27(2):157-72;
Value of Prognostic Information Why do you want to know prognosis? -- ALS, slow vs rapid progression -- GBM, expected survival -- Na-MELD Score vs. Na-MELD + Ascites
Value of Prognostic Information • To inform treatment or other clinical decisions • To inform (prepare) patients and their families • To stratify by disease severity in clinical trials Altman, D. G. and P. Royston (2000). "What do we mean by validating a prognostic model?" Stat Med19(4): 453-73.
Doctors and patients like prognostic information But hard to assess its value Most objective approach is decision-analytic. Consider: What decision is to be made? Costs of errors? Cost of test? Value of Prognostic Information
Common Problems with Studies of Prognostic Tests See Chapter 7
Combining Tests/Diagnostic Models • Importance of test non-independence • Recursive Partitioning • Logistic Regression • Variable (Test) Selection • Importance of validation separate from derivation (calibration and discrimination revisited)
Combining TestsExample Prenatal sonographic Nuchal Translucency (NT) and Nasal Bone Exam as dichotomous tests for Trisomy 21* *Cicero, S., G. Rembouskos, et al. (2004). "Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the 11-14-week scan." Ultrasound Obstet Gynecol23(3): 218-23.
If NT ≥ 3.5 mm Positive for Trisomy 21* *What’s wrong with this definition?
In general, don’t make multi-level tests like NT into dichotomous tests by choosing a fixed cutoff • I did it here to make the discussion of multiple tests easier • I arbitrarily chose to call ≥ 3.5 mm positive
One Dichotomous Test Trisomy 21 Nuchal D+ D- LR Translucency ≥ 3.5 mm 212 478 7.0 < 3.5 mm 121 4745 0.4 Total 333 5223 Do you see that this is (212/333)/(478/5223)? Review of Chapter 3: What are the sensitivity, specificity, PPV, and NPV of this test? (Be careful.)
Nuchal Translucency • Sensitivity = 212/333 = 64% • Specificity = 4745/5223 = 91% • Prevalence = 333/(333+5223) = 6% (Study population: pregnant women about to undergo CVS, so high prevalence of Trisomy 21) PPV = 212/(212 + 478) = 31% NPV = 4745/(121 + 4745) = 97.5%* * Not that great; prior to test P(D-) = 94%
Clinical Scenario – One TestPre-Test Probability of Down’s = 6%NT Positive Pre-test prob: 0.06 Pre-test odds: 0.06/0.94 = 0.064 LR(+) = 7.0 Post-Test Odds = Pre-Test Odds x LR(+) = 0.064 x 7.0 = 0.44 Post-Test prob = 0.44/(0.44 + 1) = 0.31
NT Positive • Pre-test Prob = 0.06 • P(Result|Trisomy 21) = 0.64 • P(Result|No Trisomy 21) = 0.09 • Post-Test Prob = ? http://www.quesgen.com/Calculators/PostProdOfDisease/PostProdOfDisease.html Slide Rule
Nasal Bone Seen NBA=“No” Neg for Trisomy 21 Nasal Bone Absent NBA=“Yes” Pos for Trisomy 21
Second Dichotomous Test Nasal Bone Tri21+ Tri21- LR Absent Yes 229 129 27.8 No 104 5094 0.32 Total 333 5223 Do you see that this is (229/333)/(129/5223)?
Pre-Test Probability of Trisomy 21 = 6%NT Positive for Trisomy 21 (≥ 3.5 mm)Post-NT Probability of Trisomy 21 = 31%NBA Positive (no bone seen)Post-NBA Probability of Trisomy 21 = ? Clinical Scenario –Two Tests Using Probabilities
Clinical Scenario – Two Tests Using Odds Pre-Test Odds of Tri21 = 0.064NT Positive (LR = 7.0)Post-Test Odds of Tri21 = 0.44NBA Positive (LR = 27.8?)Post-Test Odds of Tri21 = .44 x 27.8? = 12.4? (P = 12.4/(1+12.4) = 92.5%?)
Clinical Scenario – Two TestsPre-Test Probability of Trisomy 21 = 6%NT ≥ 3.5 mm AND Nasal Bone Absent
Question Can we use the post-test odds after a positive Nuchal Translucency as the pre-test odds for the positive Nasal Bone Examination? i.e., can we combine the positive results by multiplying their LRs? LR(NT+, NBE +) = LR(NT +) x LR(NBE +) ? = 7.0 x 27.8 ? = 194 ?
Answer = No Not 194 158/(158 + 36) = 81%, not 92.5%
Non-Independence Absence of the nasal bone does not tell you as much if you already know that the nuchal translucency is ≥ 3.5 mm.
Clinical Scenario Using Odds Pre-Test Odds of Tri21 = 0.064NT+/NBE + (LR =68.8)Post-Test Odds = 0.064 x 68.8 = 4.40 (P = 4.40/(1+4.40) = 81%, not 92.5%)
Non-Independence of NT and NBA Apparently, even in chromosomally normal fetuses, enlarged NT and absence of the nasal bone are associated. A false positive on the NT makes a false positive on the NBE more likely. Of normal (D-) fetuses with NT < 3.5 mm only 2.0% had nasal bone absent. Of normal (D-) fetuses with NT ≥ 3.5 mm, 7.5% had nasal bone absent. Some (but not all) of this may have to do with ethnicity. In this London study, chromosomally normal fetuses of “Afro-Caribbean” ethnicity had both larger NTs and more frequent absence of the nasal bone. In Trisomy 21 (D+) fetuses, normal NT was associated with the presence of the nasal bone, so a false negative on the NT was associated with a false negative on the NBE.
Non-Independence Instead of looking for the nasal bone, what if the second test were just a repeat measurement of the nuchal translucency? A second positive NT would do little to increase your certainty of Trisomy 21. If it was false positive the first time around, it is likely to be false positive the second time.
Reasons for Non-Independence Tests measure the same aspect of disease. One aspect of Down’s syndrome is slower fetal development; the NT decreases more slowly and the nasal bone ossifies later. Chromosomally NORMAL fetuses that develop slowly will tend to have false positives on BOTH the NT Exam and the Nasal Bone Exam.
Reasons for Non-Independence Tests measure the same aspect of disease. Consider exercise ECG (EECG) and radionuclide scan as tests for coronary artery disease (CAD) with the gold standard being anatomic narrowing of the arteries on angiogram. Both EECG and nuclide scan measure functional narrowing. In a patient without anatomic narrowing (a D- patient), coronary artery spasm could cause false positives on both tests.
Reasons for Non-Independence Spectrum of disease severity. In the EECG/nuclide scan example, CAD is defined as ≥70% stenosis on angiogram. A D+ patient with 71% stenosis is much more likely to have a false negative on both the EECG and the nuclide scan than a D+ patient with 99% stenosis.
Reasons for Non-Independence Spectrum of non-disease severity. In this example, CAD is defined as ≥70% stenosis on angiogram. A D- patient with 69% stenosis is much more likely to have a false positive on both the EECG and the nuclide scan than a D- patient with 33% stenosis.