Evidence & Recommendations – Seeking a Standard

Evidence & Recommendations – Seeking a Standard The New York Academy of Medicine Teaching Evidence Assimilation for Collaborative Healthcare New York, August 8, 2012 Yngve Falck-Ytter, MD, AGAF for the GRADE team Associate Professor, Case Western Reserve University, Case & VA Medical Center Chief of Gastroenterology, VA Medical Center, Cleveland

It’s evident – or is it?

Question to the audience Decisions in your medical practice are based on: • Training, experience and knowledge of respected colleagues • Patient preferences • Convincing evidence (non experimental) from case reports, case series, disease mechanism • RCTs, systematic reviews of RCTs and meta-analyses • All of the above

Evidence-based clinical decisions Clinical circumstances Patient values and preferences Expertise Research evidence Haynes et al. 2002

Are guidelines evidence-based? • 1,275 recommendations evaluated from NGC • Not reliably identifiable rec. in 32% • Not executable as written • Common problem: statement of fact only • Variability in recommendation strength: • Absent 53%, inaccurate 7% • Why is it so hard? Hussain T, Michel G, Schiffman R. Int J Med Inform 2009

Before GRADE Source of evidence Grades of recomend. Level of evidence I SR, RCTs A II Cohort studies B III Case-control studies IV Case series C V Expert opinion D

Before GRADE Source of evidence Grades of recomend. Level of evidence Ia Ib Meta-analysis RCTs A II Cohort studies B III Case-control studies IV Case series C V Expert opinion D

Is there any guidance here? P: In patients with acute hepatitis C …I : Should anti-viral treatment be used … C: Compared to no treatment … O: To achieve viral clearance? Evidence Recommendation Organization B Class I AASLD (2009) II-1 -/- VA (2006) 1+ A SIGN (2006) IIb -/- “Most authorities…” B (firm evidence) AGA (2006) UK (2008)

Question to the audience By now… • …you are thoroughly confused • …you start treatment because treatment is recommended • …you don’t start treatment because guidelines don’t recommend it • …you look at the evidence yourself because past experience tells you that guidelines don’t help

Just until recently… AGA AASLD ACG ASGE 1. Multiple published, well-controlled (?) randomized trials or a well designed systemic (?) meta-analysis A Multiple RCTs or meta-analysis A. RCTs Good Consistent, well-designed, well conducted studies […] B. RCT with important limitations B Single randomized trial, or non-randomized studies Fair Limited by the number, quality or consistency of individual studies […] 2. One quality-published (?) RCT, published well-designed cohort/ case-control studies C. Obser-vational studies 3. Consensus of authoritative (?) expert opinions based on clinical evidence or from well designed, but uncontrolled or non-rand. clin. trials C Only consensus opinion of experts, case studies, or standard-of-care Poor … important flaws, gaps in chain of evidence… D. Expert opinion

Institute of Medicine • March 2011 report: “Clinical Practice Guidelines We Can Trust” • Establishing transparency • Management of conflict of interest • Guideline development group composition • Evidence based on systematic reviews • Method for rating strength of recommendations • Articulation of recommendations • External review • Updating

Grades of Recommendations Assessment, Development and Evaluation

60+ Organizations 2008 2010 2006 2005 2007 2009 2011

Where GRADE fits in Prioritize problems, establish panel Find/appraise or prepare: Systematic review Searches, selection of studies, data collection and analysis (Re-) Assess the relative importance of outcomes Prepare evidence profile: Quality of evidence for each outcome and summary of findings GRADE Guidelines: Assess overall quality of evidence Decide direction and strength of recommendation Draft guideline Consult with stakeholders and / or external peer reviewer Disseminate guideline Implement the guideline and evaluate

GRADE is outcome-centric Outcome #1 Quality Outcome #2 Quality Outcome #3 Quality I B II V III Old system GRADE

Importance of outcomes Final health outcomes Mortality Liver cancer Liver cirrhosis Chronic hepatitis B infection Acute symptom. infection Question (PICO) Should health care worker receive booster vaccination vs. not? Intermediate outcomes Positive hepatitis B core antibody Amnestic response to re-challenge Loss of protective surface antibody

GRADE expands quality of evidence determinants Inconsistency of results Risk of bias Failure of blinding Methodological limitations Incomplete reporting Indirectness of evidence Losses to follow-up Allocation concealment Imprecision of results Publication bias

GRADE: Quality of evidence For guidelines: The extent to which our confidence in an estimate of the treatment effect is adequate to support a particular recommendation. Although quality of evidence is a continuum, we suggest using 4 categories: • High • Moderate • Low • Very low

Determinants of quality • RCTs start high • Observational studies start low

Quality of evidence: beyond risk of bias Definition: The extent to which our confidence in an estimate of the treatment effect is adequate to support a particular recommendation Methodological limitations Inconsistency of results Indirectness of evidence Imprecision of results Publication bias Sources of indirectness: Indirect comparisons Patients Interventions Comparators Outcomes Risk of bias: Allocation concealment Blinding Intention-to-treat Follow-up Stopped early

All phase II and III licensing trial for antidepressant drugs between 1987 and 2004 (74 trials – 23 were not published)

Quality assessment criteria Lower if… Higher if… Quality of evidence Study design Study limitations (design and execution) High RCTs  Observational studies  Moderate Inconsistency What can raise the quality of evidence? Low Indirectness Very low Imprecision Publication bias

BMJ 2003;327:1459–61 23

Question to the audience You review all colonoscopies for average risk screening in your health system and document a percentage of patient who developed a perforation after the procedure (evidence of free air on imaging). No comparison group without colonoscopy available. Rate the quality of evidence for the outcome perforation: • High • Moderate • Low • Very low

Question to the audience A systematic review of observational studies showed a relationship between front sleeping position (versus back position) and sudden infant death syndrome (SIDS): OR 2.93 (1.15, 7.47). Rate the quality of evidence for the outcome SIDS: • High • Moderate • Low • Very low

Quality assessment criteria Lower if… Higher if… Quality of evidence Study design Study limitations (design and execution) High RCTs  Observational studies  Large effect (e.g., RR 0.5) Very large effect (e.g., RR 0.2) Moderate Inconsistency Evidence of dose-response gradient Low Indirectness All plausible confounding… …would reduce a demonstrated effect …would suggest a spurious effect when results show no effect Very low Imprecision Publication bias

Conceptualizing quality High We are very confident that the true effect lies close to that of the estimate of the effect. Moderate We are moderately confident in the estimate of effect: The true effect is likely to be close to the estimate of effect , but possibility to be substantially different. Our confidence in the effect is limited: The true effect may be substantially different from the estimate of the effect. Low Very low We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect.

GRADE Evidence Profile Design Incon- sistency Imprecision Relative and Absolute Risk Importance Limitations Indirect- ness Publication bias Overall Quality

Quality rating outcomes across studies Clinical question Rate importance Select outcomes High P I C O Outcome Critical Moderate Outcome Critical Grade down or up Outcome Important Overall quality of evidence Low Outcome Important Less Outcome Very low important Panel • Formulate recommendations: • For or against (direction) • Strong or weak (strength) • By considering: • Quality of evidence • Balance benefits/harms • Values and preferences • Revise if necessary by considering: • Resource use (cost)

From evidence to recommendations RCT Obser-vational study Balance between benefits, harms & burdens Quality of evidence Patients’ values & preferences High level recommen-dation Lower level recommen-dation Old system GRADE

Strength of recommendation “The strength of a recommendation reflects the extent to which we can, across the range of patients for whom the recommendations are intended, be confident that desirable effects of a management strategy outweigh undesirable effects.” Although the strength of recommendation is a continuum, we suggest using two categories:“Strong” and “Weak”

4 determinants of the strength of recommendation Factors that can weaken the strength of a recommendation Explanation • Lower quality evidence The higher the quality of evidence, the more likely is a strong recommendation. • Uncertainty about the balance of benefits versus harms and burdens The larger the difference between the desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely is a weak recommendation warranted. • Uncertainty or differences in patients’ values The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted. • Uncertainty about whether the net benefits are worth the costs The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted.

Developing recommendations

Implications of a strong recommendation • Population: Most people in this situation would want the recommended course of action and only a small proportion would not • Health care workers: Most people should receive the recommended course of action • Policy makers: The recommendation can be adapted as a policy in most situations

Implications of a conditional recommendation • Population: The majority of people in this situation would want the recommended course of action, but many would not • Health care workers: Be prepared to help people to make a decision that is consistent with their own values/decision aids and shared decision making • Policy makers: There is a need for substantial debate and involvement of stakeholders

Create evidence profile with GRADEpro Summary of findings & estimate of effect for each outcome Guideline development Rate overall quality of evidence across outcomes based on lowest quality of critical outcomes Rate quality of evidence for each outcome Outcomes across studies Formulate question Rate importance Select outcomes RCT start high, obs. data start low Risk of bias Inconsistency Indirectness Imprecision Publication bias P I C O Outcome Critical High Outcome Critical Moderate Grade down Low Outcome Important Very low Outcome Less important Large effect Dose response Confounders Grade up Panel • Formulate recommendations: • For or against (direction) • Strong or weak (strength) • By considering: • Quality of evidence • Balance benefits/harms • Values and preferences • Revise if necessary by considering: • Resource use (cost) Systematic review • “We recommend using…” • “We suggest using…” • “We recommend against using…” • “We suggest against using…”

GRADE’s limitations • Evidence rating for alternative management strategies, not risk or prognosis per se. • Does not eliminate disagreements in interpreting the evidence – judgments on thresholds continue to be necessary • Requires some training in methodology to be applied optimally

What GRADE isn’t • Not another “risk of bias” tool • Not a quantitative system (no scoring required) • Not eliminate COI, but able to minimize • Not “expensive” • Builds on well established principles of EBM • Some degree of training is needed for any system • Proportionally adds minimal amount of extra time to a systematic review

Evidence review stage What format of evidence do you use? $$$ Using mainly systematic reviews (SR) Mainly using single study data Have the resources Don’t have the resources Ready to use SR Not ready to use SR Do it in-house Search for SR Out-source Use GRADE without evidence profiles Update SR Ad hoc reviews $ Utilize the full GRADE framework (± evidence Profiles)

Conclusion Using internationally accepted and standardized rating system for evidence and recommendations (such as GRADE) adds value: • Criteria for evidence assessment across a range of questions, settings and outcomes • Sensible, transparent, systematic • Balance between simplicity and methodological rigor

Evidence & Recommendations – Seeking a Standard