Systems for Grading Evidence of Medical Effectiveness

Systems for Grading Evidence of Medical Effectiveness David Atkins, MD, MPH QUERI Director

EvidenceRecommendation B Class I C+ 1 IV C Organization AHA ACCP SIGN Different Rating Schemes Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease

What Makes A Good System of Producing Guidelines? • Objective – Process is free of bias • Reliable -- Conclusions are reproducible across different investigators and topics • Transparent -- Process for arriving at the conclusions clear • Useful -- Understandable, persuasive, makes decision-making easier • Usable – Practical, efficient • Valid -- System produces the RIGHT results

Steps in Going From Evidence to Guidelines • Identify the important questions • Search for evidence that is relevant • Evaluate strengths and weaknesses of INDIVIDUAL STUDIES • Evaluate quality of a BODY OF EVIDENCE • Weigh important benefits and harms • Translate evidence into recommendations

Learning From Our Mistakes • Anti-arrhythmic therapy • Relied on intermediate outcomes, ignored harms • Hormone therapy • Selection bias, confounding in observational studies • High dose chemo/BMT for breast cancer • Selection bias in uncontrolled case series • Vitamin E for CHD • Confounding in observational studies; selective use of trial evidence • Drug eluting stents? • Short vs. long-term outcomes, overlooked rare harms • Erythropoeitin/darbopoeitin in cancer and chronic kidney disease • ? Quality of life outcomes/ Neglect of harms/Off label uses • Ezetimibe for lipid control? • Reliance on intermediate vs. clinical endpoint

Assess quality of evidence • What do we mean by quality? “Extent to which a study’s design, conduct, and analysis has minimized selection, measurement, and confounding biases.” • Lohr, J Qual Improvement, 1999 “Extent to which one can be confident that an estimate of effect is correct” Inversely related to likelihood that new evidence will change our confidence in the estimate. • GRADE , BMJ 2004

Disclaimer • GRADE does not focus on quality of individual studies • USPSTF has general guidance by study design • Variety of sources for more explicit guidance on assessing design and execution of trials and observational studies • Cochrane Handbook, AHRQ RTI report

Why Assess Quality of a Body of Evidence? • Assess whether evidence sufficient to make a recommendation • Higher quality evidence allows stronger recommendation • Lower quality evidence points to: • Need for more research • Weaker recommendation • Recommendations that may change

Why Grade Recommendations? • Strong recommendations more persuasive • Strong recommendations more appropriate for quality measures, reminders • Weak recommendations identify areas where: • Clinical judgment is more important • Patient preference may be important • Identify need for research

Historical Perspective: Canadian and U.S. Task Forces on Preventive Services 1984-1996 I At least one well-conducted RCT II-1 Controlled trials without randomization II-2 Well-designed cohort or case-control studies, preferably from multiple sites II-3 Multiple time-series with or without intervention. Dramatic before-after results (e.g. penicillin) III Expert opinion

Canadian and U.S. Task Forces on Preventive Services 1984-1996 AGood evidence to recommend B Fair evidence to recommend C Insufficient evidence to recommend for or against D Fair evidence to recommend agaisnt E Good evidence to recommend against

Evolution of USPSTF Rating Systems (1998 - 2008) • Separate out quality of evidence from magnitude of benefit • Acknowledge that quality is not simply a function of study design • Specifies other factors related to quality

What’s Unique About USPSTF? • Focus on screening and behavioral interventions • Routine use of “analytic frameworks” to address issues without direct evidence • Audience familiar with its 20 yr hierarchy: Good, Fair, Poor and A, B, C.

Expanding Understanding of Quality USPSTF - Body of Evidence • Internal validity – Is answer “true”? • External validity – Is answer relevant? • Coherence – Does it fit with everything else we know?

6 Questions for USPSTF • Internal validity • Is research design appropriate? • Are studies high quality (well-executed, free of bias)? • External validity • Are results generalizable to primary care practice? • Coherence • How many and how large are studies? • How consistent are results? • Are there additional factors that assist conclusions (e.g., dose-response, biological model)

USPSTF “Certainty” • High: Consistent results from well-designed and well-conducted studies in representative populations; directly assess important outcomes. Unlikely ot change… • Moderate: Evidence sufficient but limited by number, quality or consistency of studies; generalizability to practice; indirect nature • Low: Limited number or power, flaws in design, gaps in evidence, lack of important outcomes

Linking Certainty and Net Benefit to Recommendations (USPSTF)

How sure are we about balance between benefits and the harms? • the estimated size of the effect for each main outcome • the precision of these estimates • the relative value attached to the expected benefits and harms • important factors that could be expected to modify the size of the expected effects in different settings; e.g. setting of care, patient population, etc.

Similarities of USPSTF/GRADE • Separate out quality and net benefits • Always consider benefits and harms • Similar attention to issues affecting quality • Emphasize health outcomes vs. intermediate outcomes

Similarities Between USPSTF and GRADE

Distinctions between GRADE and USPSTF: Quality or Certainty • USPSTF relies on global judgment to integrate factors • GRADE considers each factor individually and moves quality up or down • GRADE explicitly downgrades non-RCT evidence but allows upgrading • USPSTF allows non-RCT evidence to start higher based on subjective issues

Limitations: USPSTF and GRADE • Limitations: USPSTF process less transparent • Limitations: GRADE more labor intensive? • Reliability: Needs further assessment for both GRADEpro process offers better chance to check and resolve differences

Differences Between USPSTF and GRADE: Recommendations • USPSTF: Has C and I recommendations for “close calls” and “Insufficient Evidence” • GRADE would probably default to weak against • USPSTF explicitly calculates “net benefit” • GRADE considers benefits and harms among other factors • More direct link between quality of evidence and recommendations in USPSTF

Conclusions • Substantial overlap between GRADE, USPSTF • AHRQ now piloting GRADE; ACP, AUA, many international groups using GRADE • USPSTF may be special special case due to emphasis on screening • GRADE has advantage of greater explicitness • GRADE may end up giving less “High” quality grades to evidence

Systems for Grading Evidence of Medical Effectiveness

Systems for Grading Evidence of Medical Effectiveness

Presentation Transcript

The evidence for radio s effectiveness

Substantial Evidence of Drug Effectiveness

Study Design Grading Systems

GRADing Evidence

Medical Evidence

Grading evidence and recommendations

Grading Strength of Evidence

Grading Evidence in Medicine

Selecting Evidence for Comparative Effectiveness Reviews

Facing Challenging Situations When Grading Strength of Evidence

Facing Challenging Situations When Grading Strength of Evidence

Grading quality of evidence the GRADE approach

Grading evidence and recommendations

Inequalities and Effectiveness Evidence

SYNTHESIZING THE EVIDENCE Grading the Evidence

Guided Care: Evidence of Cost-Effectiveness

Evaluating and grading evidence

Grading the quality of evidence

Grading Strength of Evidence

RESTAURANT GRADING SYSTEMS

Evidence of Effectiveness

HISTORY OF ASPHALT GRADING SYSTEMS