440 likes | 465 Views
Options for Summarizing Medical Test Performance in the Absence of a “ Gold Standard ”. Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Medical Test Reviews Methods Guide www.ahrq.gov. Learning Objectives.
E N D
Options for SummarizingMedical Test Performancein the Absence of a “Gold Standard” Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Medical Test Reviews Methods Guide www.ahrq.gov
Learning Objectives • Recognize settings where the reference standard may be imperfect (i.e., no “gold standard”) • Describe sources of potential bias resulting from the use of an imperfect reference standard when estimating the sensitivity and specificity of a medical test • Understand the options for analyzing data, their advantages and justification, and potential assumptions Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Introduction: Classical Paradigm Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Introduction: Reference Standard Issues • “True” status is directly observable (e.g., for tests predicting short-term mortality after a procedure). • “True” status is commonly based on a reference standard (test), which is considered to be a “gold standard” if it actually reflects the “true” status. • “Reference standard bias” arises when the reference test does not mirror the truth well. • The further the reference test deviates from the truth, the less accurate the estimate of the index test’s performance. • An “imperfect reference standard” is a reference standard test that misclassifies“true” status at a rate that cannot be ignored. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Imperfect Reference Standard Scenario (1 of 2) • The simplest case is an index test and a reference standard that give dichotomous results (e.g., positive or negative for disease). • Both the index and reference tests can err by not reflecting the true status. • The example in the following slide shows true 2-by-2 table probabilities in relation to the eight combinations of index and reference test results. • These eight probabilities (1, 1, 1, 1, 2, 2, 2, and 2) need to be estimated from the accuracy data. • The “perfect” reference standard is the “gold standard.” Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Imperfect Reference Standard Scenario (2 of 2) Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Imperfect Reference Standard Bias (1 of 2) Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Imperfect Reference Standard Bias (2 of 2) • “Naïve” estimates are underestimates versus true values when test results are independent among those with and without the condition of interest (“conditional independence”). Abbreviations: Seindex = index test specificity Spindex = index test specificity P = disease prevalence Solid red line = true sensitivity Dashed red line = true specificity Solid black line = naïve sensitivity Dashed black line = naïve specificity Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Reference Standard Validity • Only rarely are we absolutely sure that the reference standard is a perfect reflection of the truth. • Often, we are comfortable with overlooking small or moderate misclassifications by the reference standard. • Hard-and-fast rules for judging the (in)adequacy of the reference standard do not exist. • Consult content experts on a case-by-case basis to make judgments. • There are three settings in which one might question the validity of the reference standard. • The reference method yields different measurements over time or across settings. • The condition of interest is variably defined. • The new method is an improved version of a usually applied test Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Imperfect Reference Standard: Setting 1 • Situation: The reference method yields different measurements over time or across settings. • Example: Diagnosis of obstructive sleep apnea typically requires a high Apnea-Hypopnea Index (AHI; an objective measurement) and the presence of suggestive symptoms and signs. • Problem: There is large night-to-night variability in measured AHI and substantial between-rater and between-laboratory variability. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Imperfect Reference Standard: Setting 2 • Situation: The condition of interest is variably defined. • Example: The disease, such as psoriatic arthritis, is complex. • Problem: There is no single symptom, sign, or measurement that suffices to make a diagnosis of the disease with certainty. Instead, a set of diagnostic criteria (symptoms, signs, imaging results, and laboratory measures) is used to identify the disease, which will unavoidably be differentially applied across studies. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Imperfect Reference Standard: Setting 3 • Situation: The new method is an improved version of a usually applied test. • Example: Measurement of parathyroid hormone (PTH) • Problem: Older measurement methodologies are being replaced by newer, more specific ones. • Measurements with the new and old methodologies do not agree very well. • It is incorrect to use the older method as the reference standard. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Analytic Options for a Systematic Review • Analytic options 1 and 2 below are preferred when possible to summarize data from two fallible tests; option 3 is also suitable. • Forgo the classical paradigm, which focuses on test accuracy; assess the ability of the index test to predict patient outcomes (using the index test as a predictive instrument). • Forgo the classical paradigm; assess agreement of the index and reference test results, that is, treat index and reference tests as two alternative measurement methods. • Using the classical paradigm, calculate “naïve” estimates of the index test’s sensitivity and specificity, but qualify study findings to avoid misinterpretation. • Mathematically adjust the “naïve” estimates of the index test’s sensitivity and specificity to account for the imperfect reference standard. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Analysis Option 1:Focus on Prediction of Patient Outcomes • Forgo the classical paradigm, which compares the index test to a reference standard (test “accuracy”). • This information is not informative or interpretable with an “imperfect” reference standard. • Instead, assess the ability of the index test to predict patient outcomes such as history, future clinical events, and response to therapy. • This option follows a well-known paradigm in systematic reviews for evaluating prognostic tests (more information is available in Module 12). Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Analysis Option 2: Focus on the Agreement of Index and Reference Tests (1 of 2) • Forgo the classical paradigm (test “accuracy”). • Instead, assess agreement (concordance) of the index and reference test results. • Simply treat the index and reference tests as two alternative measurement methods. • How to do this depends on whether the results are categorical or continuous. • For categorical test results: • Cohen’s kappa statistic is a measure of categorical agreement that accounts for agreement by chance. • Meta-analyses of kappa statistics are not common in the medical literature; they will need to be explained and interpreted in detail. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Analysis Option 2: Focus on the Agreement of Index and Reference Tests (2 of 2) • When there are continuous test results but individual data points are available, the researcher can: • Directly compare measurements between tests • Pool data from all available studies and: • Perform regression of one test versus another, which accounts for measurement error • Conduct a Bland-Altman analysis (difference vs. the average of the two test results) • When there are continuous test results but individual data points are not available, the researcher can: • Summarize study-level information from (1) or (2) above Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Analysis Option 3:Calculate “Naïve” Estimates and Discuss Bias • Calculate “naïve” estimates of the index test sensitivity (Se) and specificity (Sp), ignoring imperfection of the reference standard but making qualitative judgments on the direction of bias of these “naïve” estimates. • Index and reference tests are independent within strata of disease (conditional independence). Naïve estimates of index test Se and Sp are biased downward (underestimated). • Index and reference tests are correlated within strata of disease. Naïve estimates of Se and Sp can be: • Overestimates if tests agree more than by chance • Underestimates when tests disagree more than by chance • Problem: The researcher cannot assume conditional independence without justification; external data are needed. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Analysis Option 3: Example • The prostate-specific antigen (PSA) test is used to detect prostate cancer. • Numerous methods have been developed to test PSA levels. • These tests prone to false-negative misclassification: PSA levels are not elevated in up to 15 percent of prostate cancer cases. • Obesity can reduce serum PSA. • Obesity will likely affect all PSA-detection methods, old and new (“conditional dependence”). • Conditional dependence of PSA tests results in overestimation of the accuracy of a new (index) test. • When compared to a non-PSA reference (e.g., a prostate biopsy), this is no conditional dependence; misclassification results in in underestimation. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Analysis Option 4: Mathematically Adjust “Naïve” Estimates • Mathematically adjust (correct) the “naïve” estimates of the index test sensitivity and specificity to account for the imperfect reference standard. • Data from 2 2 tables are not enough; additional information is needed from the literature. • The task is easiest if conditional independence can be assumed when: • The sensitivity and specificity of an imperfect reference test are known from other studies. • The specificity of both the index and imperfect reference standard are known from other studies, but the sensitivities are unknown. • Use Bayesian inference to add prior distribution data from other studies as opposed to fixed values. It provides data on sensitivity, specificity, and disease prevalence. • Alternative sets of assumptions are possible. • Problem: Model mis-specification can result in biased estimates. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Example: Performing a Systematic Review on Obstructive Sleep Apnea • Obstructive sleep apnea (OSA) is characterized by sleep disturbances secondary to upper airway obstruction. • OSA has a prevalence of 2 to 4percent in middle-aged adults. • It is associated with daytime somnolence, cardiovascular morbidity, diabetes, and other adverse outcomes. • Treatment includes continuous positive airway pressure. • A systematic review on the diagnosis of OSA in the home setting used: • Portable monitors as the index diagnostic test • Facility-based polysomnography as the reference standard • The reviewers first attempted analysis option 3, then moved on to analysis option 2. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip S, Raman G, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Technology Assessment. Available at www.cms.gov/Medicare/Coverage/DeterminationProcess/downloads/id48TA.pdf.
Systematic Review Example:Choice of Reference Standard and Cutoff • There is no “perfect” or accepted reference standard for obstructive sleep apnea (OSA). • A diagnosis of OSA is based on suggestive signs and symptoms and objective assessment of breathing patterns during sleep with facility-based polysomnography (PSG). • PSG quantifies the Apnea-Hypopnea Index (AHI). • Portable monitors can also measure AHI. • A high AHI (usually ≥15 events per hour of sleep) is suggestive of OSA; alternative cutoffs range from 5 to 40 events/hour. • The main analysis in the systematic reviews used a cutoff of AHI ≥15, but cutoffs of 10 and 20 were also analyzed (there were too few data to analyze other cut-offs). Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip S, Raman G, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Technology Assessment. Available at www.cms.gov/Medicare/Coverage/DeterminationProcess/downloads/id48TA.pdf.
Systematic Review Example:Analysis Option 3 — Naïve Estimates • The reviewers calculated “naïve” estimates of the sensitivity (Se) and specificity (Sp) of the Apnea-Hypopnea Index by comparing portable monitors with polysomnography and qualified the results. • “Naïve” estimates of sensitivity and specificity were displayed in the receiver operator characteristic space. • High Se and Sp levels were suggested. • However, there was considerable variability in the measurements. • It was not possible to deduce whether the “naïve” estimates overestimate or underestimate the “true” Se and Sp. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip S, Raman G, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Technology Assessment. Available at www.cms.gov/Medicare/Coverage/DeterminationProcess/downloads/id48TA.pdf.
Systematic Review Example:Analysis Option 2 — Pooled Data Analysis • Reviewers also described concordance between Apnea-Hypopnea Index (AHI) measured by portable monitors (“index” test) versus polysomnography (“reference” test) with Bland-Altman analysis (continuous data with individual points available), but are the tests interchangeable? • They found better agreement for lower AHI levels. Dashed line = line of perfect agreement Broad limits = suboptimal agreement Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip S, Raman G, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Technology Assessment. Available at www.cms.gov/Medicare/Coverage/DeterminationProcess/downloads/id48TA.pdf.
Systematic Review Example:Analysis Option 2 — Study-Specific Results • The reviewers summarized Bland-Altman plots across studies. • The mean difference in the two measurements of the Apnea-Hypopnea Index (mean bias) and the 95-percent limits of agreement are shown for each study. • The 95-percent limits of agreement are very wide in most studies, suggesting great variability in the measurements with the two methods. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip S, Raman G, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Technology Assessment. Available at www.cms.gov/Medicare/Coverage/DeterminationProcess/downloads/id48TA.pdf.
Systematic Review Example:Conclusions and a Recommendation • Measurements of the Apnea-Hypopnea Index (AHI) with the two methods generally agree on which patients have 15 or less events per hour of sleep (low AHI). • The methods disagree on the exact measurement among people who have higher AHIs on average. • The reviewers identified a gap in the literature. • The reviewers recommended undertaking studies that perform clinical validation of portable monitors, i.e. their ability to predict patients’ history, risk propensity, or clinical profile (analysis option 1). Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Trikalinos TA, Ip S, Raman G, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Technology Assessment. Available at www.cms.gov/Medicare/Coverage/DeterminationProcess/downloads/id48TA.pdf.
Overall Recommendations • When multiple reference standard tests, or multiple cutoffs for the same reference test, are available: • Justify the choice of test and/or cutoff or • Consider analyzing multiple options • Decide on the most appropriate analysis options to synthesize test performance. • The four analysis options presented in this module are largely complementary approaches and are not mutually exclusive. • Analysis options 1, 2, and 3 are recommended. • Analysis option 4 requires expert statistical help. • There are no empirical data on the merits and pitfalls of the mathematical adjustments in option 4 for an imperfect reference standard. Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
Practice Question 1 (1 of 2) • The validity of the reference standard should be questioned when the new test being evaluated is an improved version of the usually applied test. • True • False
Practice Question 1 (2 of 2) Explanation for Question 1: The statement is true. There are several situations when the validity of the reference standard should be questioned. These include when a new method of testing is an improved version of the usually applied test. Measurements using the different methods may not agree well.
Practice Question 2 (1 of 2) • Which of the following options is considered most preferable for evaluating information on a diagnostic test when there is no perfect reference test (gold standard)? • Assess the test’s ability to predict patient-relevant outcomes instead of test accuracy. • Assess whether the results of the two tests agree or disagree and treat them as two alternative measurement methods. • Calculate estimates of the index test’s sensitivity and specificity from each study, but qualify the study findings. • Adjust the estimates of sensitivity and specificity of the index test to account for the imperfect reference standard.
Practice Question 2 (2 of 2) Explanation for Question 2: The correct answer is a. All of the options listed are suggested methods for synthesizing information on medical tests when there is no gold standard. The preferred method involves assessing the test’s ability to predict patient-relevant outcomes instead of calculating test accuracy when compared with an imperfect standard. This way, the index test is treated as a predictive instrument.
Practice Question 3 (1 of 2) • When considering imperfect reference standard bias, which of the following applies to naïve estimates of sensitivity and specificity when there is conditional independence of the results? • They are overestimates compared to the true values. • They are underestimates compared to the true values. • They are always equal to the true values. • They cannot be compared to the true values.
Practice Question 3 (2 of 2) Explanation for Question 3: The correct answer is b. Conditional independence implies that the results of the index and reference tests are independent among people with and without the condition of interest. In this case, estimates of sensitivity and specificity from the standard formulas will usually be smaller than the true values. In other words, the naïve estimates of sensitivity and specificity for the index test will be underestimates of the true values.
Practice Question 4 (1 of 2) • When evaluating a medical test with no gold standard, one can mathematically calculate accurate sensitivity and specificity of the index test using standard 2 2 cross-tabulation of test results. • True • False
Practice Question 4 (2 of 2) Explanation for Question 4: The statement is false. The estimates of sensitivity and specificity will have to be adjusted to account for the imperfect reference standard. This may require expert statistical help.
Authors • This presentation was prepared by Brooke Heidenfelder, Andrzej Kosinski, Rachael Posey, Lorraine Sease, Remy Coeytaux, Gillian Sanders, and Alex Vaz, members of the Duke University Evidence-based Practice Center • The module is based on Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Chang SM and Matchar DB, eds. Methods guide for medical test reviews. Rockville, MD: Agency for Healthcare Research and Quality; June 2012. p. 9.1-16. AHRQ Publication No. 12-EHC017. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
References (1 of 8) • Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics 2004 Jun;60(2):427-35. PMID: 15180668. • Altman DG, Bland JM. Absence of evidence is not evidence of absence. BMJ 1995 Aug;311(7003):485. PMID: 7647644. • Bablok W, Passing H, Bender R, et al. A general regression procedure for method transformation. Application of linear regression procedures for method comparison studies in clinical chemistry, Part III. J Clin Chem Clin Biochem 1988 Nov;26(11):783-90. PMID: 3235954. • Black MA, Craig BA. Estimating disease prevalence in the absence of a gold standard. Stat Med 2002 Sep 30;21(18):2653-69. PMID: 12228883. • Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999 Jun;8(2):135-60. PMID: 10501650.
References (2 of 8) • Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol 2003 Jul;22(1):85-93. PMID: 12858311. • Bossuyt PM. Interpreting diagnostic test accuracy studies. Semin Hematol 2008 Jul;45(3):189-95. PMID: 18582626. • Dendukuri N, Hadgu A, Wang L. Modeling conditional dependence between diagnostic tests: a multiple latent variable model. Stat Med 2009 Feb 1;28(3):441-61. PMID: 19067379. • Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics 2001 Mar;57(1):158-67. PMID: 11252592. • Garrett ES, Eaton WW, Zeger S. Methods for evaluating the performance of diagnostic tests in the absence of a gold standard: a latent class model approach. Stat Med 2002 May 15;21(9):1289-307. PMID: 12111879.
References (3 of 8) • Gart JJ, Buck AA. Comparison of a screening test and a reference test in epidemiologic studies. II. A probabilistic model for the comparison of diagnostic tests. Am J Epidemiol 1966 May;83(3):593-602. PMID: 5932703. • Goldberg JD, Wittes JT. The estimation of false negatives in medical screening. Biometrics 1978 Mar;34(1):77-86. PMID: 630038. • Gyorkos TW, Genta RM, Viens P, et al. Seroepidemiology of Strongyloides infection in the Southeast Asian refugee population in Canada. Am J Epidemiol 1990 Aug;132(2):257-64. PMID: 2196791. • Hui SL, Zhou XH. Evaluation of diagnostic tests without gold standards. Stat Methods Med Res 1998 Dec;7(4):354-70. PMID: 9871952. • Joseph L, Gyorkos TW. Inferences for likelihood ratios in the absence of a "gold standard". Med Decis Making 1996 Oct-Dec;16(4):412-7. PMID: 8912303.
References (4 of 8) • Jonas DE, Wilt TJ, Taylor BC, et al. Chapter 11: challenges in and principles for conducting systematic reviews of genetic tests used as predictive indicators. J Gen Intern Med 2012 Jun;27 Suppl 1:S83-93. PMID: 22648679. • Linnet K. Estimation of the linear relationship between the measurements of two methods with proportional errors. Stat Med 1990 Dec;9(12):1463-73. PMID: 2281234. • Linnet K. Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies. Clin Chem 1998 May;44(5):1024-31. PMID: 9590376. • Qu Y, Tan M, Kutner MH. Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 1996 Sep;52(3):797-810. PMID: 8805757.
References (5 of 8) • Reitsma JB, Rutjes AW, Khan KS, et al. A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. J Clin Epidemiol 2009 Aug;62(8):797-806. PMID: 19447581. • Rutjes AW, Reitsma JB, Coomarasamy A, et al. Evaluation of diagnostic tests when there is no gold standard. A review of methods. Health Technol Assess 2007 Dec;11(50):iii, ix-51. PMID: 18021577. • Sokal RR, Rohlf EF. Biometry. New York, NY: Freeman; 1981. • Sun S. Meta-analysis of Cohen's kappa. Health Serv Outcomes Res Method 2011;11:145-163. • Thompson IM, Pauler DK, Goodman PJ, et al. Prevalence of prostate cancer among men with a prostate-specific antigen level < or =4.0 ng per milliliter. N Engl J Med 2004 May 27;350(22):2239-46. PMID: 15163773.
References (6 of 8) • Toft N, Jorgensen E, Hojsgaard S. Diagnosing diagnostic tests: evaluating the assumptions underlying the estimation of sensitivity and specificity in the absence of a gold standard. Prev Vet Med 2005 Apr;68(1):19-33. PMID: 15795013. • Torrance-Rynard VL, Walter SD. Effects of dependent errors in the assessment of diagnostic test performance. Stat Med 1997 Oct 15;16(19):2157-75. PMID: 9330426. • Trikalinos TA, Balion TA. Options for summarizing medical test performance in the absence of a “gold standard.” In: Chang SM and Matchar DB, eds. Methods guide for medical test reviews. Rockville, MD: Agency for Healthcare Research and Quality; June 2012. p. 9.1-16. AHRQ Publication No. 12-EHC017. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.
References (7 of 8) • Trikalinos TA, Balion CM, Coleman CI, et al. Chapter 8: meta-analysis of medical test performance when there is a “gold standard.” J Gen Intern Med 2012 Jun;27 Suppl 1:S56-66. PMID: 22648676. • Trikalinos TA, Ip S, Raman G, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Technology Assessment (Prepared by the Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD, Agency for Healthcare Research and Quality; August 2007. Available at www.cms.gov/Medicare/Coverage/ Determination Process/downloads/id48TA.pdf. • Vacek PM. The effect of conditional dependence on the evaluation of diagnostic tests. Biometrics 1985 Dec;41(4):959-68. PMID: 3830260. • Walter SD, Irwig LM. Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. J Clin Epidemiol 1988;41(9):923-37. PMID: 3054000.
References (8 of 8) • Walter SD, Irwig L, Glasziou PP. Meta-analysis of diagnostic tests with imperfect reference standards. J Clin Epidemiol 1999 Oct;52(10):943-51. PMID: 10513757. • Whiting P, Rutjes AW, Reitsma JB, et al. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 2004 Feb 3;140(3):189-202. PMID: 14757617.