Assessing Risk of Bias as a Domain of Quality in Medical Test Studies

Assessing Risk of Bias as a Domain of Quality in Medical Test Studies Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Medical Test Reviews Methods Guide www.ahrq.gov

Overview of a Medical Test Review • Analyze and Synthesize Studies • Assess Risk of Bias as a Domain of Quality • Assess Applicability • Grade the Body of Evidence • Meta-analysis of Test Performance Evidence With a “Gold Standard”—or — • Meta-analysis of Test Performance Evidence With an Imperfect Reference Standard • Decision Modeling Extract Data from Studies • Prepare Topic • Develop the Topic and Structure the Review • Choose the Important Outcomes • Search for and Select Studies for Inclusion • Search for Studies Research Sources Report Medical Test Review Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Learning Objectives • Identify sources of bias that may affect the internal validity of results. • Identify validated criteria/tools and select risk-of-bias items that can be used to assess specific biases in medical tests. • Explain why standardizing the application of criteria is important. • Recognize that methods for assessing study limitations and risk of bias should be established in advance and documented clearly. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Background (1 of 3) • Quality assessment is the evaluation of study features that may influence the relative importance placed on a study. • The evaluation process includes an examination of the following factors: • Systematic error • Random error • Adequacy of reporting • Aspects of data analysis • Applicability • Specifying ethics approval • Detailing sample size estimates • No consensus has yet been achieved on the optimal criteria for quality assessment in systematic reviews. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Background (2 of 3) • Two overarching questions for quality as “value for judgment making”: • Are the results for the population and test in the study accurate and precise? • Relates to both systematic error (lack of accuracy/bias) and random error (lack of precision) • Is the study applicable to the patients targeted by the review? • Relevance to both the population of interest in the study itself (relates to potential for selection bias) and the population represented by the Key Questions (i.e., applicability) Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Background (3 of 3) • This module highlights key issues in evaluating risk of bias related to the overarching question, Are the results for the population and test in the study accurate and precise? • In particular, it focuses on systematic errors resulting from: • Study design • Conduct of study • Reporting of study findings • These systematic errors can lead to overestimation or underestimation of test performance. • Module 6 deals with the second overarching question, Is the study applicable to the patients targeted by the review? Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Evidence for Biases Affecting Medical Test Studies (1 of 2) • Whiting et al. (2004) conducted a review of bias in diagnostic test studies. • The results did not permit conclusions about the direction or relative magnitude of effects for biases but showed that bias does occur. • Some sources of bias are particularly common in diagnostic accuracy studies. These are: • Spectrum bias • Partial verification bias • Clinical review bias • Observer or instrument variation Whiting P, Rutjes AW, Reitsma JB, et al. Ann Intern Med 2004 Feb 3;140(3):189-202. PMID: 14757617. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Commonly Reported Sources of Systematic Error in Studies of Medical Test Performance (1 of 2) Whiting P, Rutjes AW, Reitsma JB, et al. Ann Intern Med 2004 Feb 3;140(3):189-202. PMID: 14757617.

Commonly Reported Sources of Systematic Error in Studies of Medical Test Performance (2 of 2) Whiting P, Rutjes AW, Reitsma JB, et al. Ann Intern Med 2004 Feb 3;140(3):189-202. PMID: 14757617.

Evidence for Biases Affecting Medical Test Studies (2 of 2) • Elements of study design/conduct that may increase risk of bias vary by study type. • Criteria for rating quality of trials of tests with clinical outcomes should not be very different from intervention studies. • The main difference is that medical test performance studies are typically cohort studies (not randomized controlled trials). • Potential biases specific to this study type must be considered: • Complete ascertainment of true disease status • The adequacy of reference standard • Spectrum effect Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Agency for Healthcare Research and Quality Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

Challenges Specific to Assessing Individual Study Limitations as a Domain of Quality • How to Identify appropriate criteria: • A number of instruments are available to assess many different aspects of study quality. • Deciding which to use can be difficult. • How to apply criteria: • Criteria developed for laboratory studies may not be appropriate for studies of medical history. • However, the review should remain true to the essence of chosen study criteria while being clear enough for others to reproduce. • How to deal with inadequate reporting: • This does not itself lead to systematic bias, but limits assessment of risk of bias. • If a study with inadequate reporting appears to make an important contribution, questions may need to be addressed to the authors of the study. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principles for Addressing the Challenges Specific to Assessing Risk of Bias in Studies of Medical Tests • Use validated criteria to address relevant sources of bias. • Standardize the application of criteria. • Decide when inadequate reporting constitutes a fatal flaw. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 1: Use Validated Criteria To Address Relevant Sources of Bias (1 of 4) The multiple bias-assessment instruments that are available were evaluated in the context of diagnostic accuracy in two systematic reviews. • The first systematic review, conducted by West et al. (2002), evaluated 18 tools. • The authors noted that all the tools were intended for use in conjunction with other design-specific tools. • Three scales met all six criteria the authors considered important: • Cochrane Methods Working Group checklist • Tools used by Lijmer et al. (1999) • National Health and Medical Research Council (Australia) checklist Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. West S, King V, Carey TS, et al. Evid Rep Technol Assess (Summ) 2002 Mar;(47):1-11. PMID: 11979732.

Principle 1: Use Validated Criteria To Address Relevant Sources of Bias (2 of 4) The multiple bias-assessment instruments that are available were evaluated in the context of diagnostic accuracy in two systematic reviews. • The second systematic review, conducted by Whiting et al. (2005), evaluated 91 tools. The majority of these tools: • Did not explicitly state a rationale for the inclusion/exclusion of items. • Had not been subjected to a test-retest reliability evaluation. • Did not provide a definition of quality components considered in the tool. • These variations reflect inconsistency in understanding quality assessment in the field of evidence-based medicine. • The authors did not recommend a particular tool and instead developed their own —Quality Assessment of Diagnostic Accuracy Studies (QUADAS). Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Whiting P, Rutjes AW, Dinnes J, et al. J Clin Epidemiol 2005 Jan;58(1):1-12. PMID: 15649665.

Principle 1: Use Validated Criteria To Address Relevant Sources of Bias (3 of 4) Quality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist • Its authors attempted to incorporate sources of bias and error with empirical basis and validity. • It contains elements beyond those of systematic bias, such as questions related to reporting. • An updated version (QUADAS-2) identifies four key domains rated in terms of risk of bias: • Patient selection • Index test(s) • Reference standard • Flow and timing Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Whiting P, Rutjes AW, Reitsma JB, et al. BMC Med Res Methodol 2003 Nov 10;3:25. PMID: 14606960. Whiting PF, Rutjes AWS, Westwood ME, et al. Ann Intern Med 2011 Oct 18;155(8):529-36. PMID: 22007046.

QUADAS-2 Questions for Assessing Risk of Bias in Diagnostic Accuracy Studies* * Questions related to the assessment of applicability will be addressed in Module 6: Assessing Applicability of Medical Test Studies in Systematic Reviews. QUADAS-2 = revised Quality Assessment of Diagnostic Accuracy Studies checklist Whiting PF, Rutjes AWS, Westwood ME, et al. Ann Intern Med 2011 Oct 18;155(8):529-36. PMID: 22007046.

Principle 1: Use Validated Criteria To Address Relevant Sources of Bias (4 of 4) • Use of criteria that have been validated by an instrument, like QUADAS-2, is recommended to assess the risk of systematic error. • Other items that assess applicability or random error are considered at a different stage of the review (see Modules 6 and 8). • Systematic reviewers may need to add criteria from other standardized checklists. For example: • Standards for Reporting of Diagnostic Accuracy (STARD) • Strengthening the Reporting of Genetic Association Studies (STREGA), which is an extension of Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Principle 2: Standardize the Application of Criteria • Standardizing the application of criteria maintains objectivity in an otherwise subjective process. • It is recommended for review teams to establish clear definitions for each criterion due to lack of empirical evidence to inform decisions. • It can be useful to pilot test the definitions with two or more reviewers. • Unreliable items can be revised. • Reliability of the ultimate criteria can be measured. • Summarize the limitations across multiple items from a single study into one of three simple categories. • Use the terms “good,” “fair,” or “poor.” • Definitions should be decided in advance and clearly reported. • It is useful to have two independent reviewers categorize the studies and then resolve disagreements by discussion. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Agency for Healthcare Research and Quality Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

Categorizing Individual Studies Into General Quality Classes Adapted from Agency for Healthcare Research and Quality Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm.

Principle 3: Decide When Inadequate Reporting Constitutes a Fatal Flaw • Inadequate reporting does not introduce systematic bias itself, but it limits the ability to assess the risk of bias. • Some reviewers may assume the worst, while others give the benefit of the doubt. • When a study makes a potentially important contribution to the review, contacting the study authors may resolve issues of reporting. • When it is not possible to obtain details, standard practice is to document inadequate reporting of particular criteria. • The reviewers must determine in advance whether failure to report certain criteria constitutes a “fatal flaw” (i.e., makes results uninterpretable/invalid). • Example: A review meant to apply to older individuals finds a study in which age was not reported; in this scenario, the study either is excluded or is included and marked as poor in quality with regard to risk of bias. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Illustrative Example: Accuracy of PatientSelf-Reports of Family History (1 of 2) • Wilson et al. (2009) undertook a systematic review to evaluate the accuracy of patient self-reports of family history and relevant factors likely to affect accuracy. • Index test: Patient self-reports of family history • Reference standard: Verification of relatives’ status from medical records or a disease/death registry • Quality Assessment of Diagnostic Accuracy Studies (QUADAS) criteria were used to evaluate the quality of eligible studies. The reviewers: • Excluded 4 of 14 criteria and justified those exclusions in an appendix. • Provided contextual examples each item used. • Defined partial verification bias criteria in the context of the index and reference tests described above. • Clearly described decision rules for rating each criterion as “yes,” “no,” or “unclear.” Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Qureshi N, Wilson B, Santaguida P, et al. AHRQ Evidence Report/Technology Assessment No. 186. Available at www.ncbi.nlm.nih.gov/books/NBK32554/pdf/TOC.pdf.

Illustrative Example:How Partial Verification Bias Was Interpreted Qureshi N, Wilson B, Santaguida P, et al. AHRQ Evidence Report/Technology Assessment No. 186. Available at www.ncbi.nlm.nih.gov/books/NBK32554/pdf/TOC.pdf.

Illustrative Example: Accuracy of PatientSelf-reports of Family History (2 of 2) • Reviewers can choose to present results of Quality Assessment of Diagnostic Accuracy Studies (QUADAS) criteria ratings in tables as the percentage of studies that scored “yes,” “no,” or “unclear”on any given individual item. • QUADAS developers do not recommend using composite scores. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm. Qureshi N, Wilson B, Santaguida P, et al. AHRQ Evidence Report/Technology Assessment No. 186. Available at www.ncbi.nlm.nih.gov/books/NBK32554/pdf/TOC.pdf . Whiting P, Rutjes AW, Reitsman JB, et al. BMC Med Res Methodol 2003 Nov 10;3:25. PMID: 14606960.

Summary • Assessing methodological quality is necessary. • Judging the overall quality of a study involves examining: • Study size • Direction and degree of findings • Study relevance • Risk of bias • Systematic error • Random error • Other study limitations (e.g., inadequate reporting) • This module focused on the evaluation of systematic bias as a distinctly important component of quality. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Key Messages • Reviewers should select validated criteria that examine the risk of systematic error when assessing study limitations. • Reviewers should categorize individual studies as “good,”“fair,” or “poor” with respect to risk of bias. • Independent categorization by two reviewers is recommended. • Methods for determining an overall categorization for the study limitations should be established beforehand and clearly documented. Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Methods guide for medical test reviews. Available at www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

Practice Question 1 (1 of 2) • Internal validity refers to: • Applicability of the study to the patients relevant to the review. • Accuracy of the results for the population and test in the study. • Relevance of the study to the population represented in the Key Questions. • Degree of random error.

Practice Question 1 (2 of 2) Explanation for Question 1: The correct answer is b. Good internal validity means that the results for the population and test in the study are accurate.

Practice Question 2 (1 of 2) • The term “applicability” generally refers to the feasibility of performing a given test. • True • False

Practice Question 2 (2 of 2) Explanation for Question 2: The statement is false. Applicability generally refers to the relevance of an intervention (including a medical test) to a population of interest. Applicability is an important component of “external validity.”

Practice Question 3 (1 of 2) • What is the QUADAS-2? • A tool used to assess the risk of systematic error in diagnostic accuracy studies • A tool used to assess applicability in diagnostic accuracy studies • A tool used to assess random error in diagnostic accuracy studies • Answers a and b

Practice Question 3 (2 of 2) Explanation for Question 3: The correct answer is d. QUADAS-2 is an updated version of the Quality Assessment of Diagnostic Accuracy Studies checklist to include these four key domains: patient selection, index test(s), reference standard, and flow and timing. It is a checklist of potential sources of bias and error used to assess the risk of systematic error. It can also be used to assess applicability.

Practice Question 4 (1 of 2) • Inadequate reporting introduces systematic bias. • True • False

Practice Question 4 (2 of 2) Explanation for Question 4: The statement is false. Inadequate reporting does not introduce systematic bias, but it does limit the reviewers’ ability to assess the risk of bias.

Authors • This presentation was prepared by Brooke Heidenfelder, Rachael Posey, Lorraine Sease, Remy Coeytaux, Gillian Sanders, and Alex Vaz, members of the Duke University Evidence-based Practice Center. • The module is based on Chapter 5, Assessing Risk of Bias as a Domain of Quality in Medical Test Studies. In: Methods Guide for Medical Test Reviews. AHRQ Publication No. 12-EC017. Rockville, MD: Agency for Healthcare Research and Quality; June 2012. www.effectivehealthcare.ahrq.gov/medtestsguide.cfm

References (1 of 5) • Agency for Healthcare Research and Quality. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Rockville, MD: Agency for Healthcare Research and Quality; April 2012. AHRQ Publication No. 10(12)-EHC063-EF. Chapters available at www.effectivehealthcare.ahrq.gov/methodsguide.cfm. • Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD Initiative. Ann Intern Med 2003 Jan 7;138(1):40-4. PMID: 12513043. • Centre for Reviews and Dissemination. Systematic reviews: CRD's guidance for undertaking reviews in health care. York, England: University of York; 2008. www.york.ac.uk/inst/crd/pdf/Systematic_Reviews.pdf. Accessed September 19, 2011. • Cochrane Methods Working Group on Systematic Review of Screening and Diagnostic Tests. Recommended methods. London: The Cochrane Collaboration; 1996.

References (2 of 5) • Higgins JPT, Altman DG, Sterne JAC; the Cochrane Statistical Methods Group and the Cochrane Bias Methods Group. Chapter 8: Assessing risk of bias in included studies. In: Higgins JPT and Green S, eds. Cochrane handbook for systematic reviews of interventions. Version 5.1.0. London: The Cochrane Collaboration; 2011. www.cochrane-handbook.org. Accessed September 19, 2011. • Leeflang MM, Deeks JJ, Gatsonis C, et al; Cochrane Diagnostic Test Accuracy Working Group. Systematic reviews of diagnostic test accuracy. Ann Intern Med. 2008 Dec 16;149(12):889-97. PMID: 19075208. • Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999 Sep 15;282(11):1061-6. PMID: 10493205. • Little J, Higgins JP, Ioannidis JP, et al. STrengthening the REporting of Genetic Association studies (STREGA)—an extension of the STROBE statement. Eur J Clin Invest. 2009 Apr;39(4):247-66. PMID: 19297801.

References (3 of 5) • National Health and Medical Research Council. How to review the evidence: systematic identification and review of the scientific literature. Canberra, Australia: National Health and Medical Research Council; 2000 www.nhmrc.gov.au/_files_nhmrc/publications/attachments/cp65.pdf. • Qureshi N, Wilson B, Santaguida P, et al. Family History and Improving Health. Evidence Report/Technology Assessment No. 186 (Prepared by the McMaster University Evidence-based Practice Center under Contract No. HHSA 290-2007-10060-I). Rockville, MD: Agency for Healthcare Research and Quality; August 2009. AHRQ Publication No. 09-E016. www.ncbi.nlm.nih.govbooks/NBK32554/pdf/TOC.pdf. • Santaguida PL, Riley CM, Matchar DB. Assessing risk of bias as a domain of quality in medical test studies. In: Chang SM and Matchar DB, eds. Methods guide for medical test reviews. Rockville, MD: Agency for Healthcare Research and Quality; June 2012. p. 5.1-5.10. AHRQ Publication No. 12-EHC017. www.effectivehealthcare.ahrq.gov/medtestsguide.cfm.

References (4 of 5) • von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007 Oct 20;370(9596):1453-7. PMID: 18064739. • West S, King V, Carey TS, et al. Systems to rate the strength of scientific evidence. Evid Rep Technol Assess. (Summ) 2002 Mar;(47):1-11. PMID: 11979732. • Whiting P, Rutjes AW, Dinnes J, et al. A systematic review finds that diagnostic reviews fail to incorporate quality despite available tools. J Clin Epidemiol. 2005 Jan;58(1):1-12. PMID: 15649665. • Whiting P, Rutjes AW, Reitsma JB, et al. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003 Nov 10;3:25. PMID: 14606960.

References (5 of 5) • Whiting P, Rutjes AW, Reitsma JB, et al. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004 Feb 3;140(3):189-202. PMID: 14757617. • Wilson BJ, Qureshi N, Santaguida P, et al. Systematic review: family history in risk assessment for common diseases. Ann Intern Med. 2009 Dec 15;151(12):878-85. PMID: 19884616.

Assessing Risk of Bias as a Domain of Quality in Medical Test Studies

Assessing Risk of Bias as a Domain of Quality in Medical Test Studies

Presentation Transcript

Test Bias

II. 8 Quality risk management as part of…

II. 2 Quality risk management as part of…

Assessing the Quality of Individual Studies

Bias in Studies of the Human Genome

Assessing Risk from Medical Radiation

Assessing Laboratory Quality – Systematic Bias

Nonresponse bias in studies of residential mobility

Assessing The Risk of Bisphenol-a

Assessing Quality of Care

Examining Gender Bias in Studies of Innovation

Assessing the Quality of Research

Test Bias

II. 1 Quality risk management as part of…

Assessing Risk of Bias as a Domain of Quality in Medical Test Studies

Assessing the Risk of Offending

Assessing Applicability of Medical Test Studies in Systematic Reviews

Assessing the Quality of Individual Studies

II. 4 Quality risk management as part of…

Chapter 17 Assessing Measurement Quality in Quantitative Studies

II. 5 Quality risk management as part of…

Studies of Bias Induced Plasma Flows in HSX