1 / 67

Sampling for an Effectiveness Study or “How to reject your most hated hypothesis”

This article discusses the importance of sampling and statistical power for policymakers. It explores the role of sampling in evaluating efficacy and effectiveness and the impact of clustering on power and costs. The article concludes with recommendations for designing a rollout to gather evidence to reject skeptics' claims.

thomasmeyer
Download Presentation

Sampling for an Effectiveness Study or “How to reject your most hated hypothesis”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling for an Effectiveness Studyor“How to reject your most hated hypothesis” Mead Over, Center for Global Developmentand Sergio Bautista, INSP Male CircumcisionEvaluation Workshop and Operations Meeting January 18-23, 2010 Johannesburg, South Africa

  2. Outline • Why are sampling and statistical power important to policymakers? • Sampling and power for efficacy evaluation • Sampling and power for effectiveness evaluation • The impact of clustering on power and costs • Major cost drivers • Conclusions

  3. Why are sampling and statistical power important to policymakers? Because they are the tools you can use to reject the claims of skeptics

  4. What claims will skeptics make about MC rollout? They might say: • ‘Circumcision has no impact’ • ‘Circumcision has too little impact’ • ‘Intensive Circumcision Program has no more impact than Routine Circumcision Program’ • ‘Circumcision has no benefit for women’ Which of these do you hate the most?

  5. So make sure the researchers design MC rollout so that you will have the evidence to reject your most hated hypothesis when it is false If it turns out to be true, you will get the news before the skeptics and can alter the program accordingly.

  6. Hypotheses to reject • Circumcision has no impact • Circumcision has too little impact • Intensive Circumcision Program has no more impact than Routine Circumcision Program • Circumcision has no benefit for women

  7. Efficacy Evaluation

  8. Hypothesis to reject • Circumcision has no impact • Circumcision has too little impact • Intensive Circumcision Program has no more impact than Routine Circumcision Program • Circumcision has no benefit for women

  9. Statistical power in the context of efficacy evaluation • Objective: To reject the hypothesis of “no impact” in a relatively pure setting, where the intervention has the best chance of succeeding – to show “proof of concept”. • In this context, statistical power can be loosely defined as the probability that you find a benefit of male circumcision when there really is a benefit.

  10. Statistical power is the ability to reject the hated hypothesis that MC doesn’t work when it really does Ho = MC does not reduce HIV incidence This column represents MC really working

  11. Statistical power is the ability to reject the hated hypothesis that MC doesn’t work when it really does Ho = MC does not reduce HIV incidence This row represents the evaluation finding that MC is working

  12. Statistical power is the ability to reject the hated hypothesis that MC doesn’t work when it really does Ho = MC does not reduce HIV incidence We believe MC works We hope evaluation will confirm that it works If MC works, we want to maximize the chance that evaluation says it works

  13. Statistical power is the ability to reject the hated hypothesis that MC doesn’t work when it really does Ho = MC does not reduce HIV incidence But we’re willing to accept bad news, if it’s true

  14. There are two types of error that we want to avoid Ho = MC does not reduce HIV incidence Evaluation says MC works when it doesn’t

  15. There are two types of error that we want to avoid Ho = MC does not reduce HIV incidence Evaluation says MC doesn’t work when it really does

  16. Statistical power is the chance that we reject the hated hypothesiswhen it is false Ho = MC does not reduce HIV incidence Power: probability that you reject “no impact” when there really is impact

  17. Confidence, power, and two types of mistakes • Confidence describes the test’s ability to minimize type-I errors (false positives) • Power describes the test’s ability to minimize type-II errors (false negatives) • Convention is to be more concerned with type-I than type-II errors • (ie, more willing to mistakenly say that something didn’t work when it actually did, than to say that something worked when it actually didn’t) • We usually want confidence to be 90 – 95%, but will settle for power of 80 – 90%

  18. Power • As power increases, the chances of saying“no impact” when in reality there is a positive impact, decline • Power analysis can be used to calculate the minimum sample size required to accept the outcome of a statistical test with a particular level of confidence

  19. The problem All men in the country, 20 years Time of Experiment Impact! 1 person, 1 year Impact? Sample Size

  20. The problem Time of Experiment Increase Power to Detect Difference Increase the Costs of the Evaluation Sample Size

  21. The problem • In principle, we would like: • The minimum sample size • The minimum observational time • The maximum power • So we are confident enough about the impact we find, at minimum cost

  22. The problem Time of Experiment Not enough confidence Sample Size

  23. The problem Time of Experiment Enough confidence Sample Size

  24. The problem Time of Experiment Comfort, Credibility, Persuasion, Confidence Frontier Sample Size

  25. The problem Minimum Sample Size Time of Experiment Time Constraint (usually external) Sample Size

  26. Things that increase power • More person-years • More persons • More years • Greater difference between control and treatment • Control group has large HIV incidence • Intervention greatly reduces HIV incidence • Good cluster design: Get the most information out of each observed person-year • Increase number of clusters • Minimize intra-cluster correlation

  27. Power is higher with larger incidence in the control group or greater effect

  28. Gaining Precision Effectiveness (% reduction In HIV incidence) Estimated Average 66 60 Precision we got 38 N in efficacy trial Person-Years

  29. With more person-years, we can narrow in to find the “real” effect Effectiveness (% reduction In HIV incidence) 80 66 60 Could be even this REAL 38 15 N in efficacy trial Person-Years

  30. The “real” effect might be higher than in efficacy trials … Effectiveness (% reduction In HIV incidence) 80 REAL 66 60 38 15 N in efficacy trial Person-Years

  31. … or the “real” effect might be lower Effectiveness (% reduction In HIV incidence) 80 66 60 38 REAL 15 N in efficacy trial Person-Years

  32. 1,000 efficacy studies when the control group incidence is 1% Power is 68%

  33. 1,000 efficacy studies when the control group incidence is 5% Power is 85%

  34. Sampling for efficacy Population of interest: HIV negative men Controls Sample:Inclusion Criteria Respondents Treatment Relevant characteristics

  35. Effectiveness Evaluation

  36. Hypothesis to reject • Circumcision has no impact • Circumcision has too little impact • Intensive Circumcision Program has no more impact than Routine Circumcision Program • Circumcision has no benefit for women

  37. What level of impact do you want to reject in an effectiveness study? • For a national rollout, you want the impact to be a lot better than zero! • What’s the minimum impact that your constituency will accept? • What’s the minimum impact that will make the intervention cost-effective?

  38. The “Male Circumcision Decisionmaker’s Tool” is available online at:http://www.healthpolicyinitiative.com/index.cfm?id=software&get=MaleCircumcision

  39. Suppose the effect is 60 % ??? Using the “MC Decisionmaker’s Tool,” let’s compare 60% effect …

  40. Suppose the effect is only 20 % ??? To a 20 % effect.

  41. Less effective intervention means less reduction in incidence 60% Effectiveness 20% Effectiveness

  42. Less effective interventionmeans less cost-effective At 20 % effectiveness, MC costs about $5,000 per HIV infection averted in example country 60% Effectiveness 20% Effectiveness

  43. Hypothesis to reject in effectiveness evaluation of MC • Circumcision has no impact • Circumcision has too little impact • Intensive Circumcision Program has no more impact than Routine Circumcision Program • Circumcision has no benefit for women

  44. Differences between effectiveness and efficacy that affect sampling • Main effect on HIV incidence in HIV- men • Null hypothesis: impact > 0 (+) • Effect size because of standard of care (+) • Investigate determinants of effectiveness • Supply side (+ / -) • Demand side (+ / -) • Investigate impact on secondary outcomes and their determinants (+ / -) • Seek “external validity” on effectiveness issues

  45. Sample size must be larger to show that the effect is at least 20 Effectiveness (% reduction In HIV incidence) 80 66 60 38 20 15 N to be able to reject <20 Person-Years

  46. Sampling for effectiveness Population of interest: HIV negative men Control Respondents Sample Treatment Sampling frame: All men (+ and -) Relevant characteristics

  47. Two levels of effectiveness Effectiveness (% reduction In HIV incidence) 80 REAL 1 66 60 38 REAL 2 15 N to detect difference between 1 and 2 Person-Years

  48. Sampling for effectiveness Population of interest: HIV negative men Control Intensity 1 Respondents Sample Intensity 2 Sampling frame: All men (+ and -) Relevant characteristics

  49. Sampling methodsfor effectiveness evaluation • Probability sampling • Simple random: each unit in the sampling frame has the same probability of being selected into the sample • Stratified: first divide the sampling frame into strata (groups, blocks), then do a simple random sample within each strata • Clustered: sample clusters of units. Eg. villages with all the persons that live there • One stage: Random sample of villages, then survey all men in selected villages • Two stage: Random sample of villages, then random sample of men in selected villages

  50. Sampling (→ Representative data) • Representative surveys • Goal: learning about an entire population • Ex. LSMS/ national household survey • Sample: representative of the national population • Impact evaluation • Goal: measuring changes in key indicators for the target population that are caused by an intervention • In practice: measuring the difference in indicators between treatment and control groups • We sample strategically in order to have a representative sample in the treatment and control groups • Which is not necessarily the same as a representative sample of the national population

More Related