1 / 36

GRADE

GRADE. Toni Tan, Centre for Clinical Practice. GRADE. The Grading of Recommendations Assessment, Development and Evaluation. GRADE. “A systematic and explicit approach to making judgements about the quality of evidence, and the strength of recommendations can help to

bena
Download Presentation

GRADE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GRADE Toni Tan, Centre for Clinical Practice

  2. GRADE The Grading of Recommendations Assessment, Development and Evaluation

  3. GRADE “A systematic and explicit approach to making judgements about the quality of evidence, and the strength of recommendations can help to prevent errors, facilitate critical appraisal of these judgements, and can help to improve communication of this information.”

  4. Organisations that have adopted GRADE methodology

  5. ‘Traditional’ approachChecklist system • Selection bias: randomisation, concealment of allocation, comparable at baseline • Performance bias: blinding (patients & care providers), the comparison groups received the same care apart from the intervention studied. • Attrition bias: systematic differences between the comparison groups with respect to participants lost • Detection bias: appropriate length of follow-up, definition of outcome, blinding (investigators)

  6. ‘Traditional’ approach Narrative summary For example, AIP guideline Mortality rates One cluster RCT from the UK investigated the effectiveness of CCOT on hospital mortality using PAR score……found a significant reduction in hospital mortality in patients in the intervention wards at cluster level (OR = 0.523, 95% CI 0.322 to 0.849). The cluster RCT from Australia found no difference in unexpected death (without do-not-resuscitate order) (secondary outcome) between control group and intervention group (per 1000 admissions: control = 1.18, intervention = 1.06, difference = −0.093 [−0.423 to 0.237], 95% CI: −0.423 to 0.237; adjusted p = 0.752, adjusted OR = 1.03, 95% CI 0.84 to 1.28). Evidence statement: (1+) There were conflicting findings in the two included studies on mortality rates: the Priestley and coworkers study found a significant reduction in mortality (but failed to report do-not-resuscitate orders), but MERIT found no difference between the two arms of the study for this outcome.

  7. GRADE • Interventional studies of effectiveness • Currently in development for diagnostic accuracy studies, prognostic and qualitative studies • Makes sequential appraisal about: • The quality of evidence across studies for each critical/important outcome (instead of individual study) • Which outcomes are critical to a decision • The overall quality of evidence across these critical outcomes • The balance between benefits and harms • Result is an assessment of: • quality of the evidence for an outcome • strength of the recommendations • Perspective of guideline developers

  8. GRADE profile

  9. Why do we use GRADE in NICE clinical guidelines? • Concerns about the sometimes inappropriate direct link between study design and recommendation strength • Anecdotal evidence that recommendations not based on evidence from trials were being ignored • WHO evaluation of the NICE clinical guidelines programme • Just being explicit about what we had been doing anyway!

  10. How GRADE works?

  11. Create evidence profile with GRADEpro Summary of findings & estimate of effect for each outcome Making recommendations (guidelines) Present evidence profile(s) to GDG Outcomes across studies Rate quality of evidence for each outcome Formulate question Rate importance Select outcomes RCT start high, obs. data start low • Risk of bias • Inconsistency • Indirectness • Imprecision • Other consideration P I C O Outcome Critical High Outcome Critical Moderate Grade down Low Outcome Important Very low Outcome Not important • Large effect • Dose response • Confounders Grade up Panel Develop recommendations: • For or against (direction) • Strong or weak (strength) By considering: • Relative value of different outcomes • Quality of evidence • Trade off - benefits/harms • Health economics • Other considerations Evidence synthesis (systematic review) • “Offer xyz…” • “Consider xyz…” • “Do not use xyz…”

  12. GRADE concept of quality of evidence • The quality of evidence reflects the extent to which our confidence (certainty) in an estimate of the effect is adequate to support a particular recommendation. • Guideline panels must make judgements about the quality of evidence relative to the specific context for which they are using the evidence.

  13. How is this achieved? • Transparent framework to consider confidence (certainty) of an effect estimate through assessing • Systematic errors (bias) • Chance errors (random errors) • Using criteria • Systematic errors (bias) • Limitations, Indirectness, Inconsistency • Chance errors (random error) • Imprecision • Other considerations (any other factors)

  14. GRADE Definitions

  15. GRADE diagram

  16. Determining the quality of evidence • Limitations • Inconsistent results • Indirectness • Imprecision • Other considerations • Large or very large effect • Plausible biases underestimate true effect • Dose-response gradient • All of above can upgrade 1 level (2 for large magnitude of effect)

  17. Limitations or ‘risk of bias’ - RCTs

  18. Risk of bias – observational studies

  19. Inconsistency • When heterogeneity exists, but no plausible explanation is identified, the quality of evidence should be downgraded by one or two levels, depending on the magnitude of the inconsistency in the results. • Inconsistency may arise from differences in: • populations (e.g. drugs may have larger relative effects in sicker populations) • interventions (e.g. larger effects with higher drug doses) • outcomes (e.g. diminishing treatment effect with time). • Account for this where possible

  20. Indirectness

  21. Indirectness

  22. Imprecision • Our estimates of the population value are uncertain/imprecise because we use samples • GRADE extended the term uncertainty in the context of whether the effect estimate reaches the ‘clinical minimal important difference’ (MID) Example of MID: Drug X compared to placebo to reduce severe migraine. Pain on migraine: measured on a 10-point scale Mean baseline = 9.5; Mean reduction from baseline = 1.7 (95%CI: 1.2 to 2.3) But survey on migraine patients said pain reduction less than 3 points is meaningless because it does not improve their overall QoL and daily function.

  23. Confidence intervals - summary • Easiest way to approach effect of random error on evidence quality • In frequentist approach, 95% CI represents • A range constructed so that in repeated experiments 95% would include the population value • Usually interpreted as p=0.95 that the population value is in the CI

  24. Confidence interval width • Wide confidence intervals imply uncertainty over whether our observed effect is close to or far away from the real effect • Examples • An RCT of supervised exercise for patellofemoral pain • Self reported recovery at 12 months • T: 9/500 vs SC: 2/500 RR=4.50 (1.00 to 20.77) • We’d probably agree that’s imprecise • An RCT of drug A for patellofemoral pain • Self reported recovery at 12 months • T: 350/500 vs PC: 150/500 RR=2.33 (2.20 to 2.72) • We’d probably agree that’s precise

  25. What affects imprecision? • Having larger samples, but particularly where there is more ‘information’ • Complex relationship between sample size, numbers of events • Easiest to play with an example

  26. Remember CIs can mislead • True values will be outside a 95%CI 5/100 times • CI based on small numbers of events are unstable • Early trials tend to be more positive • Trials stopped early likely to be biased • So, if you have small trials with a positive effect and apparently narrow CI, be sceptical • It would be helpful to have an objective idea of when we have ‘enough’ information

  27. of 0.2 Figure 4: Optimal information size given alpha of 0.05 and beta for varying control event rates and RRR of 20%, 25% and 30% 6000 For any chosen line, evidence meets 5000 optimal information size criterion if above the line 4000 RRR=20% Total sample size required 3000 2000 RRR=25% 1000 RRR=30% 0 0.2 0.4 0.6 0.8 1.0 Control group event rate Optimal information size (OIS) • We want at least as many observations in a trial as we would calculate in a sample size calculation Warning – ‘Power-based’ sample size calculation is for ‘hypothesis testing’ using p-value, not for estimation of true effect

  28. OIS continued • Thinking of numbers of events may be easier, and could just use arbitrarynumber if don’t have resources to calculate OIS

  29. Summary of suggested approach to imprecision Red: mean -1 as MID 1 = ‘no effect’ and precise; 2 = ‘no effect’ but not precise; 3 = ‘effective’ and precise Green: mean -2 as MID 1 = ‘no effect’ and precise; 2 = ‘no effect’ and precise; 3 = ‘effective’ and precise Blue: mean -3 as MID 1 = ‘no effect’ and precise; 2 = ‘no effect’ and precise; 3 = ‘effective’ and not precise

  30. What if we don’t know a threshold? • Can use an arbitrary threshold • For example, GRADE suggests RRR or RRI of 25% • Often used in NICE guidelines

  31. Two things to remember about GRADE • Many judgements are made in appraising evidence, and there will always be disagreement. The important thing is to make the areas of disagreement transparent. • The concepts we are judging e.g. imprecision are continuous, and dichotomising it (downgrade or not) can be a close call. Where it is, the evidence to recommendations section should discuss it

  32. PDE-5 inhibitor vs. placebo

  33. Create evidence profile with GRADEpro Summary of findings & estimate of effect for each outcome Making recommendations (guidelines) Present evidence profile(s) to GDG Outcomes across studies Rate quality of evidence for each outcome Formulate question Rate importance Select outcomes RCT start high, obs. data start low • Risk of bias • Inconsistency • Indirectness • Imprecision • Other consideration P I C O Outcome Critical High Outcome Critical Moderate Grade down Low Outcome Important Very low Outcome Not important • Large effect • Dose response • Confounders Grade up Panel Develop recommendations: • For or against (direction) • Strong or weak (strength) By considering: • Relative value of different outcomes • Quality of evidence • Trade off - benefits/harms • Health economics • Other considerations Evidence synthesis (systematic review) • “Offer xyz…” • “Consider xyz…” • “Do not use xyz…”

  34. Evidence to recommendations • Structured discussion of • Relative value placed on outcomes • Trade off between clinical benefits and harms • Trade off between net health benefits and resource use • Quality of the evidence • Other considerations • Place within pathway of care • Equalities issues • Practicalities of implementation e.g. need for training

  35. Strength of recommendation • Stronger:‘the GDG is confident that the desirable effects of adherence to a recommendation outweigh the undesirable effects’ • ‘Should do ...’ • Weaker: the GDG concludes that the desirable effects of adherence to a recommendation probably outweigh the undesirable effects, but is not confident’ • ‘Should consider ...’

  36. Further information • http://www.gradeworkinggroup.org/ • Ongoing series of papers in Journal of Clinical Epidemiology addressing all of these issues • Toni.tan@nice.org.uk

More Related