Systematic Review Module 11: Grading Strength of Evidence

Systematic Review Module 11:Grading Strength of Evidence Kathleen N. Lohr, PhDDistinguished Fellow RTI International

Learning Objectives • What does grading strength of evidence (SOE) mean? • Why is grading SOE important? • How does grading SOE differ from rating quality of articles? • What are the primary and additional domains for grading SOE? • What variables, outcomes, and comparisons do you grade? • How are SOE domains scored? • How do you arrive at an overall SOE grade? • How do you present SOE scores?

Grading SOE: Guidance • Distinct from rating quality of articles/studies • CERs as main focus (i.e., comparative effectiveness) • Content here pertains only to interventional studies, not screening or diagnostic tests • Generally applicable to all EPC systematic reviews

Aims for Creating Guidance on Grading SOE • Facilitate use of the reports by diverse decisionmakers and stakeholders • Give decisionmakers a more comprehensive evaluation of the evidence than before • Provide explanation of methods to non-EPC readers • Provide authoritative citation for EPCs to use in reviews • Foster transparency and documentation • Especially important in ARRA era

Three Steps to Grading SOE • Scoring four “required” domains • Risk of bias • Consistency • Directness • Precision • Considering, possibly scoring, four “additional” domains • Dose-response association • Plausible confounders • Strength of association • Publication bias • Combining scores from required domains into a single SOE score, taking scores on additional domains into account as needed

Four Required Domains:Risk of Bias • Concerns both study design and study conduct for individual studies, rated by usual methods • Assesses the aggregate quality of studies within each major study design and integrate those assessments into an overall risk-of-bias score • Scores: high, medium, or low • High risk of bias lowers SOE grade • Low risk of bias raises SOE grade

Four Required Domains:Consistency (I) • Degree of similarity in the effect sizes of different studies within an evidence base • Consistent: • Same direction of effect (same side of “no effect”) • Narrow range of effect sizes • Inconsistent: nonoverlapping confidence intervals, significant unexplained clinical or statistical heterogeneity, etc.

Four Required Domains:Consistency (II) • Scores (levels) • Consistent (i.e., no inconsistency) • Inconsistent • Unknown or not applicable (single study cannot be assessed) • Meta-analysis: use appropriate tests

Four Required Domains:Directness (I) • Whether evidence reflects a single, direct link between the interventions of interest and the ultimate health outcome under consideration or relies on multiple links • Using analytic frameworks is important • SOE can be only as strong as weakest link if multiple links are involved

Four Required Domains:Directness (II) • Scores: • Direct: is based on health outcomes • Indirect: relies on surrogate/proxy outcomes (implies more than one body of evidence is needed)

Four Required Domains: Directness in Comparisons • Direct: e.g., A vs. B, A vs. C, and B vs. C • Head-to-head studies in the evidence base • Generally assumes use of health outcomes, not surrogate/proxy outcomes • Better SOE • Indirect: e.g., A vs. B, B vs. C, but not A vs. C • No head-to-head studies that cover all interventions or outcomes of interest • Problematic situation • SOE not as strong as with direct evidence

Four Required Domains:Precision (I) • Degree of certainty for estimate of effect with respect to a specific outcome • Complicated concept • What can decisionmakers conclude about whether one treatment is, clinically speaking, inferior, equivalent (neither inferior nor superior), or superior to another • Does include statistical significance

Four Required Domains:Precision (II) • Scores: separately for each important outcome as presented in “summary estimate” • Precise: estimate allows a clinically useful conclusion • Imprecise: confidence interval so wide it could include clinically distinct (even conflicting) conclusions

Additional Domains: General • Four domains: • Dose-response association • Plausible confounders • Strength of association • Publication bias • Domains are “discretionary”: use when they are • Applicable • Helpful in reaching conclusions about overall grades for SOE

Additional Domains: Dose-response Association (I) • Pattern of a larger effect with greater exposure (dose, duration, adherence) either across or within studies • Rate if studies give levels of exposure

Additional Domains: Dose-response Association (II) • Scores: • Present: dose-response pattern observed • Not present: no dose-response pattern observed (dose-response relationship not present) • NA (not applicable or not tested)

Additional Domains:Plausible Confounding (I) • In an observational study, sometimes plausible confounding factors work in the direction opposite that of the observed effect • Had such “effect-weakening” confounders not been present, the observed effect would have been even larger than the one observed • In such a case, an EPC may want to upgrade the level of evidence • So, consider whether plausible confounding exists that would decrease the observed effect

Additional Domains:Plausible Confounding (II) • Scores: • Present: confounding factors that would decrease the observed effect may be present • Absent: confounding factors that would decrease the observed effect are not likely to be present

Additional Domains:Strength of Association (I) • Magnitude of effect: likelihood that the observed effect is large enough that it cannot have occurred solely as a result of bias from potential confounding factors • Consider when effect size is particularly large

Additional Domains:Strength of Association (II) • Scoring • Strong: large effect size that is unlikely to have occurred in the absence of a true effect of the intervention • Weak: small enough effect size that it could have occurred solely as a result of bias from confounding factors

Additional Domains:Publication Bias (I) • Studies may have been published selectively (e.g., only a small proportion of relevant trials [or other studies] has been published) • Thus, • Estimated effects of an intervention based on published studies do not reflect true effect • Publication bias may undermine the overall robustness of a body of evidence

Additional Domains:Publication Bias (II) • Scores: • Need not be formally computed but can influence ratings of required domains • Take these possible publication bias factors into account in • Rating for consistency • Calculating a summary confidence interval for an effect • Comment on publication bias when circumstances suggest that relevant empirical findings, particularly negative or no-difference findings, have not been published or are not otherwise available

Applicability (I) • Evaluate external validity • Judge applicability intended for different decisionmakers and user groups • Take into account how well the evidence maps to a variety of contexts, specifically • Patient populations, diseases or conditions, interventions, comparators, outcomes, and settings

Applicability (II) • Make judgments about applicability explicit and separate from assessments of other domains • Make clear when any statements about evidence are based on applicability rather than on other aspects of the evidence

Procedures for Assessing Domains • Use two or more reviewers with the appropriate clinical and methodological expertise • Assess separately • Each required domain (or each optional domain, as relevant) • For each major outcome, including benefits and harms • Resolve differences by consensus or mediation by an additional expert; consensus scores appear in tables • Record and save each reviewer’s individual judgments about domains as background documentation

Strength of Evidence Grades (I) • Global assessment that • Takes the required domains directly into account • As needed, incorporates judgments about the additional domains as well

Strength of Evidence Grades (II) • For each comparison of interest, rate SOE for • Each major benefit (e.g., positive impact on health outcomes such as physical function or quality of life or effects on laboratory measures or other surrogate variables) • Each major harm (ranging from rare, serious, or life-threatening adverse events to common but bothersome effects) • For both benefits and harms, focus on outcomes most relevant to patients, clinicians, and policymakers

Strength of Evidence Grades and Definitions • High: High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect. • Moderate: Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. • Low: Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. • Insufficient: Evidence either is unavailable or does not permit a conclusion.

Strength of Evidence Grades: Additional Points • Using high, moderate, or low SOE • Implies that a body of evidence actually exists • Is intended to convey how confident reviewers are about decisions that might be made based on evidence graded one way or another • Requires using only one designation, not range (e.g., not “low to moderate”) • Using insufficient • Applies when reviewers truly cannot draw conclusions about an outcome, comparison, or other question • Arises when • No evidence is available at all • When evidence is just too feeble or insubstantial to permit drawing conclusions (e.g., opposing results from studies with similar risk of bias ratings; wide and overlapping confidence intervals)

Scoring and Reporting: General Guidance • May use different approaches to incorporate multiple domains into an overall strength-of-evidence grade • Specifically, can use • GRADE algorithm itself • EPC’s own weighting system • Some qualitative approach • Use (at least) two reviewers • Assess resulting inter-rater reliability for each domain score; keep records

Guiding Principles:Risk of Bias (I) • Risk of bias (given design and conduct of available studies) is the essential component in determining a SOE grade • First consider which study design is most appropriate to reduce bias for each question • Next consider the risk of bias from available studies

Guiding Principles:Risk of Bias (II)—Example Drug comparisons: with RCTs, either placebo or active comparator, as appropriate design • Evidence from well-done studies will have less risk of bias than evidence based on observational studies • So may start with a rating of low for risk of bias and change the assessment of this domain if the RCTs have important flaws • Then, observational data may generally start with a rating of high risk of bias, but can change assessment depending on how well studies were conducted

Further Guiding Principles to Scoring • Be explicit about whether the evidence grade will be determined by • A point system for combining ratings of the domains or • A qualitative consideration of the domains • Carefully document procedures • Keep records of procedures and results for each review so that they may contribute to the overall EPC expertise and science-of-grading evidence

Further Guiding Principles for Reporting (I) • Explain • Rationale for approach and which domains were important in upgrading or downgrading strength of evidence • Judgments about the degree to which any additional domains altered overall strength-of-evidence grade • Provide enough detail within the report to ensure that users can grasp the methods

Further Guiding Principles for Reporting (II) • Use the terms: high, moderate, low, or insufficient • Not Roman numerals or other symbols • Use or adapt the illustrative tabular approach to reporting • Next slide or • See chapter (AHRQ EHC website) or article (eprint as of late 2009) for an example

Grading SOE: Table for Presentation of Results Table 4. Treatment 1 vs. Treatment 2: Numbers of Studies and Subjects, Strength of Evidence Domains, Magnitude of Effect, and Strength of Evidence for Key Outcomes

Sources of Information Cross-EPC Authors: Douglas K. Owens, Kathleen N. Lohr, David Atkins, Jonathan R. Treadwell, James T. Reston, Eric B. Bass, Stephanie Chang, Mark Helfand • Article: Grading the strength of a body of evidence when comparing medical interventions—Agency for Healthcare Research and Quality and the Effective Health Care Program. J ClinEpidemiol. 2009 Jul 10. [Epub ahead of print]. • Chapter on AHRQ website: Grading the strength of a body of evidence when comparing medical interventions. http://effectivehealthcare.ahrq.gov/repFiles/2009_0805_grading.pdf.

Summary: Grading Strength of Evidence • Is a critical last step in analysis and presentation • Is done after rating quality of articles and by at least two independent reviewers • Helps users of systematic reviews understand the body of evidence and how much confidence they can have in making decisions based on that evidence • Uses scores on four primary (mandatory) domains and four additional (discretionary) domains • Focuses on major outcomes and comparisons • Is denoted in terms of high, moderate, or low strength or insufficient evidence • Presents SOE grades in tabular form

Systematic Review Module 11: Grading Strength of Evidence

Systematic Review Module 11: Grading Strength of Evidence

Presentation Transcript

Introduction to Evidence-Based Health Care and the Systematic Review of Evidence

Systematic Review Module 8: Assessing Applicability

GRADing Evidence

Grading evidence and recommendations

Grading the Strength of a Body of Evidence on Diagnostic Tests

Grading Strength of Evidence

Grading Evidence in Medicine

Grading the evidence in systematic reviews of measurement properties 23 september 2010

Introduction to Evidence-Based Health Care and the Systematic Review of Evidence

Facing Challenging Situations When Grading Strength of Evidence

Facing Challenging Situations When Grading Strength of Evidence

Systematic Review Module 3: Study Eligibility Criteria

Systematic Review

Systematic Review Module 1: Refining Key Questions

SYNTHESIZING THE EVIDENCE Grading the Evidence

Grading the quality of evidence

Grading Strength of Evidence

Systematic Review Module 11: Grading Strength of Evidence Interactive Quiz

Systematic Review Module 10: Quantitative Synthesis II

Systematic Review Module 12: Presentation of Findings

Systematic review including the strength of evidence

Grading the Strength of a Body of Evidence on Diagnostic Tests