240 likes | 381 Views
Systems for Grading Evidence of Medical Effectiveness. David Atkins, MD, MPH QUERI Director. Evidence Recommendation B Class I C+ 1 IV C. Organization AHA ACCP SIGN. Different Rating Schemes.
E N D
Systems for Grading Evidence of Medical Effectiveness David Atkins, MD, MPH QUERI Director
EvidenceRecommendation B Class I C+ 1 IV C Organization AHA ACCP SIGN Different Rating Schemes Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease
What Makes A Good System of Producing Guidelines? • Objective – Process is free of bias • Reliable -- Conclusions are reproducible across different investigators and topics • Transparent -- Process for arriving at the conclusions clear • Useful -- Understandable, persuasive, makes decision-making easier • Usable – Practical, efficient • Valid -- System produces the RIGHT results
Steps in Going From Evidence to Guidelines • Identify the important questions • Search for evidence that is relevant • Evaluate strengths and weaknesses of INDIVIDUAL STUDIES • Evaluate quality of a BODY OF EVIDENCE • Weigh important benefits and harms • Translate evidence into recommendations
Learning From Our Mistakes • Anti-arrhythmic therapy • Relied on intermediate outcomes, ignored harms • Hormone therapy • Selection bias, confounding in observational studies • High dose chemo/BMT for breast cancer • Selection bias in uncontrolled case series • Vitamin E for CHD • Confounding in observational studies; selective use of trial evidence • Drug eluting stents? • Short vs. long-term outcomes, overlooked rare harms • Erythropoeitin/darbopoeitin in cancer and chronic kidney disease • ? Quality of life outcomes/ Neglect of harms/Off label uses • Ezetimibe for lipid control? • Reliance on intermediate vs. clinical endpoint
Assess quality of evidence • What do we mean by quality? “Extent to which a study’s design, conduct, and analysis has minimized selection, measurement, and confounding biases.” • Lohr, J Qual Improvement, 1999 “Extent to which one can be confident that an estimate of effect is correct” Inversely related to likelihood that new evidence will change our confidence in the estimate. • GRADE , BMJ 2004
Disclaimer • GRADE does not focus on quality of individual studies • USPSTF has general guidance by study design • Variety of sources for more explicit guidance on assessing design and execution of trials and observational studies • Cochrane Handbook, AHRQ RTI report
Why Assess Quality of a Body of Evidence? • Assess whether evidence sufficient to make a recommendation • Higher quality evidence allows stronger recommendation • Lower quality evidence points to: • Need for more research • Weaker recommendation • Recommendations that may change
Why Grade Recommendations? • Strong recommendations more persuasive • Strong recommendations more appropriate for quality measures, reminders • Weak recommendations identify areas where: • Clinical judgment is more important • Patient preference may be important • Identify need for research
Historical Perspective: Canadian and U.S. Task Forces on Preventive Services 1984-1996 I At least one well-conducted RCT II-1 Controlled trials without randomization II-2 Well-designed cohort or case-control studies, preferably from multiple sites II-3 Multiple time-series with or without intervention. Dramatic before-after results (e.g. penicillin) III Expert opinion
Canadian and U.S. Task Forces on Preventive Services 1984-1996 AGood evidence to recommend B Fair evidence to recommend C Insufficient evidence to recommend for or against D Fair evidence to recommend agaisnt E Good evidence to recommend against
Evolution of USPSTF Rating Systems (1998 - 2008) • Separate out quality of evidence from magnitude of benefit • Acknowledge that quality is not simply a function of study design • Specifies other factors related to quality
What’s Unique About USPSTF? • Focus on screening and behavioral interventions • Routine use of “analytic frameworks” to address issues without direct evidence • Audience familiar with its 20 yr hierarchy: Good, Fair, Poor and A, B, C.
Expanding Understanding of Quality USPSTF - Body of Evidence • Internal validity – Is answer “true”? • External validity – Is answer relevant? • Coherence – Does it fit with everything else we know?
6 Questions for USPSTF • Internal validity • Is research design appropriate? • Are studies high quality (well-executed, free of bias)? • External validity • Are results generalizable to primary care practice? • Coherence • How many and how large are studies? • How consistent are results? • Are there additional factors that assist conclusions (e.g., dose-response, biological model)
USPSTF “Certainty” • High: Consistent results from well-designed and well-conducted studies in representative populations; directly assess important outcomes. Unlikely ot change… • Moderate: Evidence sufficient but limited by number, quality or consistency of studies; generalizability to practice; indirect nature • Low: Limited number or power, flaws in design, gaps in evidence, lack of important outcomes
Linking Certainty and Net Benefit to Recommendations (USPSTF)
How sure are we about balance between benefits and the harms? • the estimated size of the effect for each main outcome • the precision of these estimates • the relative value attached to the expected benefits and harms • important factors that could be expected to modify the size of the expected effects in different settings; e.g. setting of care, patient population, etc.
Similarities of USPSTF/GRADE • Separate out quality and net benefits • Always consider benefits and harms • Similar attention to issues affecting quality • Emphasize health outcomes vs. intermediate outcomes
Distinctions between GRADE and USPSTF: Quality or Certainty • USPSTF relies on global judgment to integrate factors • GRADE considers each factor individually and moves quality up or down • GRADE explicitly downgrades non-RCT evidence but allows upgrading • USPSTF allows non-RCT evidence to start higher based on subjective issues
Limitations: USPSTF and GRADE • Limitations: USPSTF process less transparent • Limitations: GRADE more labor intensive? • Reliability: Needs further assessment for both GRADEpro process offers better chance to check and resolve differences
Differences Between USPSTF and GRADE: Recommendations • USPSTF: Has C and I recommendations for “close calls” and “Insufficient Evidence” • GRADE would probably default to weak against • USPSTF explicitly calculates “net benefit” • GRADE considers benefits and harms among other factors • More direct link between quality of evidence and recommendations in USPSTF
Conclusions • Substantial overlap between GRADE, USPSTF • AHRQ now piloting GRADE; ACP, AUA, many international groups using GRADE • USPSTF may be special special case due to emphasis on screening • GRADE has advantage of greater explicitness • GRADE may end up giving less “High” quality grades to evidence