Monitoring and Evaluation: Evaluation Designs

Monitoring and Evaluation: Evaluation Designs

Objectives of the Session By the end of this session, participants will be able to: • Understand the purpose, strengths, and shortcomings of different study designs • Distinguish between study designs that enable us to causally link program activities to observed changes and study designs that do not • Link evaluation designs to the types of decisions that need to be made

Causality Requirements • A precedes B. • B is present only when A is present. • We can rule out all other possible causes of B.

The Basic Experimental Principle • The intervention is the only difference between two groups • This is achieved by random assignment

Class Activity Can you name situations in which random assignment can be used in evaluation?

An Experimental Design Experimental group O1 X O2 RA Control group O3 O4

An Experimental Design-Cont’d. • In this design, there are two groups, an experimental group and a control group. Both have been randomly selected and both complete the pre-test. Only the experimental group gets the intervention, then both groups complete the post-test.

An Experimental Design-Cont’d. Steps • Identify people or groups, some of which could get the intervention. • Pre-test everyone. • Randomly assign some of the participants to either the control group or the experimental group. • Deliver the intervention to the experimental group. The control group may receive an alternative intervention or nothing at all. • Post-test both groups with the same instrument under the same conditions.

Factors that May Lead Us to Make Invalid Conclusions • Dropout: There may be loss to follow-up. • Instrumentation effects: Occur when a questionnaire is changed between pre-test and post-test. • Testing effects: Occur because study participants remember questions that were asked of them at pre-test and perform better at post-test because they are familiar with the questions.

A Second Experimental Design X O2 Experimental group RA O4 Control group

A Second Experimental Design-Cont’d • In this design, experimental and control groups are formed; however, there is no pre-test. Instead, the experimental group gets the intervention and then both groups are measured at the end of the program.

A Non-Experimental Design Time Experimental group O1 X O2

A Non-Experimental Design-Cont’d • In this method of evaluation, only people who are participating in the program get the pre- and post-test. Steps • Pre-test everyone in the program. • Deliver the intervention. • Post-test the same individuals. This design does not provide any information about what kinds of results might have occurred without the program and is the weakest in terms of scientific rigor.

Another Factor that May Lead to Invalid Conclusions • History effects: These occur when extraneous events (events that occur outside the study) influence study-measured outcomes.

A Second Non-Experimental Design Time Experimental group O1 O2 O3 X O4 O5 O6

A Second Non-Experimental Design-Cont’d • For this design, a survey is administered multiple times - before, during, and after a program

A Second Non-Experimental Design-Cont’d Steps • Select a program-outcome measure that can be used repeatedly. • Decide who will be in the experimental group. Will it be the same group of people measured many times, or will it be successive groups of different people? • Collect at least three measurements prior to the intervention that were made at regular intervals. • Check the implementation of the intervention. • Continue to collect measurements, at least through the duration of the program.

A Quasi-Experimental Design Time Experimental group O1 X O2 --------------------------------- O3 O4 Comparison group

A Quasi-Experimental Design-Cont’d. • In this design, two groups which are similar, but which were not formed by random assignment, are measured both before and after one of the groups gets the program intervention.

A Quasi-Experimental Design-Cont’d. Steps • Identify people who will be getting the program. • Identify people who are not getting the program, but are other ways very similar. • Pre-test both groups. • Deliver the intervention to the experimental group. The control group may receive an alternative intervention or nothing at all. • Post-test both groups.

Threat to Validity • Selection effects: Occur when people selected for a comparison group differ from the experimental group.

Summary Features of Different Study Designs

Summary Features of Different Study Designs-Cont’d. • Non-experimental (One-Group, Post-Only) • Non-experimental (One-Group, Pre- and Post-Program) IMPLEMENT PROGRAM ASSESS TARGET GROUP AFTER PROGRAM ASSESS TARGET GROUP BEFORE PROGRAM ASSESS TARGET GROUP AFTER PROGRAM IMPLEMENT PROGRAM

Summary Features of Different Study Designs-ctd III. Experimental (Pre- and Post-Program with Control Group) IMPLEMENT PROGRAM WITH TARGET GROUP A ASSESS TARGET GROUP A ASSESS TARGET GROUP A TARGET GROUP A RANDOMLY ASSIGN PEOPLE FROM THE SAME TARGET POPULATION TO GROUP A OR GROUP B ASSESS TARGET GROUP A ASSESS CONTROL GROUP B CONTROL GROUP B

Summary Features of Different Study Designs IV. Quasi-Experimental (Pre- and Post-Program with Non-Randomized Comparison Group) ASSESS TARGET GROUP AFTER PROGRAM ASSESS TARGER GROUP BEFORE PROGRAM IMPLEMENT PROGRAM ASSESS COMPARISON GROUP BEFORE PROGRAM ASSESS COMPARISON GROUP AFTER PROGRAM

Summary Features of Different Study Designs-Cont’d. • The different designs vary in their capacity to produce information that allows you to link program outcomes to program activities. • The more confident you want to be about making these connections, the more rigorous the design and costly the evaluation. • Your evaluator will help determine which design will maximize your program’s resources and answer your team’s evaluation questions with the greatest degree of certainty.

Important Issues to Consider When Choosing a Design • Complex evaluation designs are most costly, but allow for greater confidence in a study’s findings. • Complex evaluation designs are more difficult to implement, and so require higher levels of expertise in research methods and analysis. • Be prepared to encounter stakeholder resistance to the use of comparison or control groups, such as a parent wondering why his or her child will not receive a potentially beneficial intervention • No evaluation design is immune to threats to its validity; there is a long list of possible complications associated with any evaluation study. However, your evaluator will help you maximize the quality of your evaluation study.

Exercise • A maternity hospital wishes to determine if the offer of post-partum family-planning methods will increase contraceptive use among women who deliver at the hospital. • What study design would you recommend to test the hypothesis that women who are offered postpartum family-planning services are more likely to use family planning than women are not offered services?

Exercise • You have been asked to evaluate the impact of a national mass-media AIDS-prevention campaign on condom use. • What study design would you choose and why?

Linking Evaluation Design to Decision-Making

Deciding Upon An Appropriate Evaluation Design • Indicators: What do you want to measure? • Provision • Utilization • Coverage • Impact • Type of inference: How sure to you want to be? • Adequacy • Plausibility • Probability • Other factors Source: Habicht, Victora, and Vaughan (1999)

Clarification of Terms

Adequacy Assessment • Adequacy studies only describe if a condition is met or not • Typically addresses provision, utilization or coverage aspects. No need for control, pre/post data in such cases • Hypothesis tested: Are expected levels achieved? • Can also answer questions of impact (magnitude of change) provided pre/post data is available • Hypothesis tested: Difference is equal or greater than expected

Features of Adequacy Assessment • Simplest (and cheapest) of evaluation models, as it does not try to control for external effects. Data are needed only for outcomes. • If only input or output results are needed, then the lack of controls is not a problem. • When measuring impact, however, it is not possible to infer that the change is due to the program due to lack of controls. • Also, if there is no change, it will not be possible to say whether the lack of change is due to program inefficiency, or if the program has impeded a further deterioration.

Class Activity • For each of the following outcomes of interest, provide indicators that would be useful in the evaluation of a program for control of diarrheal diseases aimed at young children with emphasis on the promotion of oral rehydration salts (ORS): • Provision: Are the services available? Are services accessible? Is their quality adequate? • Utilization: Are the services being used? • Coverage: Is target population being reached? • Impact: Were there improvements in disease patterns or health behaviors?

Adequacy Assessment Inferences • Are objectives being met? • Compares program performance with previously-established adequacy criteria, e.g. 80% ORT-use rate • No control group • 2+ measurements to assess adequacy of change over time • Provision, utilization, coverage • Are activities being performed as planned? • Impact • Are observed changes in health or behavior of expected direction and magnitude? • Cross-sectional or longitudinal Source: Habicht, Victora and Vaughan (1999)

Class Activity • What are the advantages of adequacy evaluations? • What are the limitations of adequacy evaluations? • If an adequacy evaluation shows a lack of change in indicators, how can this be interpreted? • Which of the study designs discussed earlier can be used for adequacy evaluations?

Plausibility Assessment Inferences (1) • Program appears to have effect above and beyond impact of non-program influences • Includes control group • Historical control group • Compares changes in community before & after program and attempts to rule out external factors • Same target population • Internal control group • Compares groups/individuals with different intensities of exposure to program (dose-response) • Compares previous exposure to program between individuals with and without the disease (case-control) • External control group • Compares communities/geographic areas with and without the program • Population that were never targeted by the intervention, but who share key characteristics with the beneficiaries Source: Habicht, Victora and Vaughan (1999)

Plausibility Assessment Inferences (2) • Provision, utilization, coverage • Intervention group appears to have better performance than control • Cross-sectional, longitudinal, longitudinal-control • Impact • Changes in health/behavior appear to be more beneficial in intervention than control group • Cross-sectional, longitudinal, longitudinal-control, case-control Source: Habicht, Victora and Vaughan (1999)

Controls and Confounding Factors • For all types of controls, the groups being compared should be similar in all respect except their exposure to the intervention • That is almost never possible, however. There is always one factor that influences one group more than another (confounding factor). E.g., mortality due to diarrhea may be due to better access to drinking water, not to the ORS program. • To eliminate this problem, confounding must be measured and statistically treated, either via matching, standardization, or multivariate analysis.

Probability Assessment Inferences • There is only a small probability that the differences between program and control areas were due to chance (P < .05) • Requires control group • Requires randomization • Often not feasible for assessing program effectiveness • Randomization needed before program starts • Political factors • Scale-up • Inability to generalize results • Known efficacy of intervention Source: Habicht, Victora and Vaughan (1999)

Summary

Discuss with Decision-Makers Before Choosing Evaluation Design

Possible Areas of Concern to Different Decision-Makers Source: Habicht, Victora and Vaughan (1999)

Evaluation Flow from Simpler to More Complex Designs Source: Habicht, Victora and Vaughan (1999)

Key Issues to Discuss with Decision Makers Before Choosing a Design • Is there a need for collecting new data? If so, at what level? • Does design include intervention-control or a before-after comparison? • How rare is the event to be measured? • How small is the difference to be detected? • How complex will the data analysis be? • How much will alternative designs cost? Source: Habicht, Victora and Vaughan (1999)

References • Adamchak S et al. (2000). A Guide to Monitoring and Evaluating Adolescent Reproductive Health Programs. Focus on Young Adults, Tool Series 5. Washington, D.C.: Focus on Young Adults. • Fisher A et al. (2002). Designing HIV/AIDS Intervention Studies. An Operations Research Handbook. New York: The Population Council. • Habicht JP et al. (1999). Evaluation Designs for Adequacy, Plausibility, and Probability of Public Health Programme Performance and Impact. International Journal of Epidemiology, 28: 10-18. • Rossi P et al. (1999). Evaluation. A Systematic Approach. Thousand Oaks: Sage Publications.

Monitoring and Evaluation: Evaluation Designs