E N D
1. The Bumps and Bruises of the Evaluation Mine FieldPresented by Olivia Silber Ashley, Dr.P.H.
2. 2
3. 3 Background on Core Evaluation Instruments Office of Management and Budget (OMB) recently examined the AFL program using its Program Assessment Rating Tool (PART)
Identified program strengths
Program purpose
Design
Management
Identified areas for improvement
Strategic planning
Program results/accountability
In response, OPA
Developed baseline and follow-up core evaluation instruments
Developed performance measures to track demonstration project effectiveness
4. 4 Staff and Client Advisory Committee Anne Badgley
Leisa Bishop
Doreen Brown
Carl Christopher
Cheri Christopher
Audra Cummings
Christina Diaz
Amy Lewin David MacPhee
Janet Mapp
Ruben Martinez
Mary Lou McCloud
Charnese McPherson
Alice Skenandore
Jared Stangenberg
Cherie Wooden
5. 5 Capacity Assessment Methods Review of grant applications, annual reports, and other information from 28 most recently funded programs
Qualitative assessment involving program directors, evaluators, and staff in:
14 Title XX Prevention programs
14 Title XX Care programs
Telephone interviews
Site visit
Observations of data collection activities
Document review
Conducted between January 26, 2006, and March 16, 2006
31 interviews involving 73 interviewees across 28 programs
100% response rate
6. 6 Selected Title XX Prevention and Care Programs Baptist Children’s Home Ministries
Boston Medical Center
Emory University
Freedom Foundation of New Jersey, Inc.
Heritage Community Services
Ingham County Health Department
James Madison University
Kings Community Action
National Organization of Concerned Black Men
Our Lady of Lourdes
Red Cliff Band of Chippewas
St. Vincent Mercy Medical Center
Switchboard of Miami, Inc.
Youth Opportunities Unlimited Children’s Home Society of Washington
Children’s Hospital
Choctaw Nation of Oklahoma
Congreso de Latinos Unidos
Hidalgo Medical Services
Illinois Department of Human Services
Metro Atlanta Youth for Christ
Roca, Inc.
Rosalie Manor Community & Family Services
San Mateo County Health Services Agency
Truman Medical Services
University of Utah
Youth and Family Alliance/Lifeworks
YWCA of Rochester and Monroe
7. 7 Capacity Assessment Research Questions How and to what extent have AFL projects used the core evaluation instruments?
What problems have AFL projects encountered with the instruments?
8. 8 Difficulties with Core Evaluation Instruments among Care Programs
9. 9 Difficulties with Core Evaluation Instruments among Prevention Programs
10. 10 Expert Work Group Elaine Borawski
Claire Brindis
Meredith Kelsey
Doug Kirby
Lisa Lieberman Dennis McBride
Jeff Tanner
Lynne Tingle
Amy Tsui
Gina Wingood
11. 11 Draft Revision of Core Evaluation Instruments Confidentiality statement
5th grade reading level
Instructions for adolescent respondents
Re-ordering of questions
Improved formatting
Sensitivity to diverse family structures
Consistency in response options
Improved fidelity to original source items Eliminated birth control question for pregnant adolescents
Modified birth control question for parenting adolescents
Clarified reference child
Separated questions about counseling/testing and treatment for STD
Modified living situation question
Improved race question
Added pneumococcal vaccine (PCV) item
12. 12 Why is a Rigorous Evaluation Design Important? Attribute changes to the program
Reduce likelihood of spurious results
OMB performance measure to improve evaluation quality
Peer-reviewed publication
Continued funding for your project and for the AFL program
Ensure that program services are helpful to pregnant and parenting adolescents
13. 13 Evaluation Design Appropriate to answer evaluation research questions
Begin with most rigorous design possible
Randomized experimental design is the gold standard to answer research questions about program effectiveness
Units for study (such as individuals, schools, clinics, or geographical areas) are randomly allocated to groups exposed to different treatment conditions
14. 14 Barriers to Randomized Experimental Design Costs:
Consume a great deal of real resources
Costly in terms of time
Involve significant political costs
Ethical issues raised by experimentation with human beings
Limited in duration
High attrition in either the treatment or control groups
Population enrolled in the treatment and control groups not representative of the population that would be affected by the treatment
Possible program contamination across treatment groups
Lack of experience using this design
(Bauman, Viadro, & Tsui, 1994; Burtless, 1995)
15. 15 Benefits of Randomized Experimental Design Able to infer causality
Assures the direction of causality between treatment and outcome
Removes any systematic correlation between treatment status and both observed and unobserved participant characteristics
Permits measurement of the effects of conditions that have not previously been observed
Offers advantages in making results convincing and understandable to policy makers
Policymakers can concentrate on the implications of the results for changing public policy
The small number of qualifications to experimental findings can be explained in lay terms
(Bauman, Viadro, & Tsui, 1994; Burtless, 1995)
16. 16 Strategies for Implementing Randomized Experimental Design Read methods sections from evaluations using randomized experimental design
Ask for evaluation technical assistance to implement this design
Recruit all interested adolescents
Ask parents/adolescents for permission to randomly assign to one of two conditions
Divide program components into two conditions
Overlay one component on top of others
Focus outcome evaluation efforts on randomly assigned adolescents
Include all adolescents in process evaluation
17. 17 An Example Study examined whether
Home-based mentoring intervention prevented second birth within 2 years of first birth
Increased participation in the intervention reduced likelihood of second birth
Randomized controlled trial involving first-time black adolescent mothers (n=181) younger than age 18
Intervention based on social cognitive theory, focused on interpersonal negotiation skills, adolescent development, and parenting
Delivered bi-weekly until infant’s first birthday
Mentors were black, college-educated single mothers
Control group received usual care
No differences in baseline contraceptive use or other measures of risk or family formation
Follow-up at 6, 13, and 24 months after recruitment at first delivery
Response rate 82% at 24 months
Intent-to-treat analysis showed that intervention mothers less likely than control mothers to have a second infant
Two or more intervention visits increased odds of avoiding second birth more than threefold
Source: Black et al. (2006). Delaying second births among adolescent mothers: A randomized, controlled trial of a home-based mentoring program. Pediatrics, 118, e1087-1099.
18. 18 Obtaining and Maintaining a Comparison Group Emphasize the value of research
Explain exactly what the responsibilities of the comparison group will be
Minimize burden to comparison group
Ask for commitment in writing
Provide incentives for data collection
Provide non-related service/materials
Meet frequently with people from participating community organizations and schools
Provide school-level data to each participating school (after data are cleaned and de-identified)
Work with organizations to help them obtain resources for other health problems they are concerned about
Add questions that other organizations are interested in
Explain the relationship of this project to the efforts of OAPP
Adapted from Foshee, V.A., Linder, G.F., Bauman, K.E., Langwick, S.A., Arriaga, X.B., Heath, J.L., McMahon, P.M., & Bangdiwala, S. (1996). The Safe Dates Project: Theoretical basis, evaluation design, and selected baseline findings. American Journal of Preventive Medicine, 12, 39-47.
19. 19 Analysis Include process measures in outcome analysis
Attrition analysis
Missing data
Assessment of baseline differences between treatment groups
Intent-to-treat-analysis
Multivariate analysis controlling for variables associated with baseline differences and attrition
20. 20 Incorporate Process Evaluation Measures in Outcome Analysis Process evaluation measures assess qualitative and quantitative parameters of program implementation
Attendance data
Participant feedback
Program-delivery adherence to implementation guidelines
Facilitate replication, understanding of outcome evaluation findings, and program improvement
Avoids Type III error: Concluding that program is not effective when program was not implemented as intended
Source: USDHHS. (2002). Science-based prevention programs and principles, 2002. Rockville, MD: Author.
21. 21 Attrition Analysis Number of participants lost over the course of a program evaluation
Some participant loss is inevitable due to transitions among program recipients
Extraordinary attrition rates generally lower the degree of confidence reviewers are able to place on outcome findings
Not needed if imputing data for all respondent missingness
Evaluate the relationship of study variables to dropout status (from baseline to follow-up)
Report findings from attrition analysis, including direction of findings
Control for variables associated with dropout in all multivariate outcome analyses
Source: USDHHS. (2002). Science-based prevention programs and principles, 2002. Rockville, MD: Author.
22. 22 Missing Data Not the same as attrition (rate at which participants prematurely leave an evaluation)
Absence of or gaps in information from participants who remain involved
A large amount of missing data can threaten the integrity of an evaluation
Item-level missingness
Run frequency distributions for all items
Consider logical skips
Report missingness
Address more than 10% missingness
Imputation procedures
Imputed single values
Multiple imputation (SAS Proc MI) replaces missing values in a dataset with a set of “plausible” values
Full Information Maximum Likelihood Modeling (FIML) estimation in a multilevel structural equation modeling (SEM) framework in Mplus 4.1 (Muthen & Muthen, 1998-2006)
Source: USDHHS. (2002). Science-based prevention programs and principles, 2002. Rockville, MD: Author.
23. 23 Analysis Appropriateness of data analytic techniques for determining the success of a program
Employ state-of-the-art data analysis techniques to assess program effectiveness by participant subgroup
Use the most suitable current methods to measure outcome change
Subgroup (moderation) analyses allow evaluation of outcomes by participant age and ethnicity, for example
Okay to start with descriptive statistics
Report baseline and follow-up results for both treatment and comparison groups
Conduct multivariate analysis of treatment condition predicting difference of differences
Control for variables associated with attrition
Control for variables associated with differences at baseline
Source: USDHHS. (2002). Science-based prevention programs and principles, 2002. Rockville, MD: Author.
24. 24 Assessment of Baseline Differences between Treatment and Comparison Groups Address the following research questions:
Are treatment and comparison group adolescents similar in terms of
Baseline levels of outcome variables (e.g., educational achievement, current school status)
Key demographic characteristics, such as
Age
Race/ethnicity
Pregnancy stage
Marital status
Living arrangements
SES
25. 25 Test for Baseline Differences Test for statistically significant differences in the proportions of adolescents in each category
If you decide to analyze potential mediators as short-term program outcomes, test for baseline differences on these mediators
Report results from these tests in the end of year evaluation report for each year that baseline data are collected
Important for peer-reviewed publication
Control for variables associated with treatment condition in outcome analyses
26. 26 An Example: Children’s Hospital Boston Study to increase parenting skills and improve attitudes about parenting among parenting teens through a structured psychoeducational group model
All parenting teens (n=91) were offered a 12-week group parenting curriculum
Comparison group (n=54) declined the curriculum but agreed to participate in evaluation
Pre-test, post-test measures included Adult-Adolescent Parenting Inventory (AAPI), the Maternal Self-Report Inventory (MSRI), and the Parenting Daily Hassles Scale
Analyses controlled for mother’s age, baby’s age, and race
Results showed that program participants or those who attended more sessions improved their mothering role, perception of childbearing, developmental expectations of child, empathy for baby, and reduced frequency of hassles in child and family events
Source: Woods et al. (2003). The parenting project for teen mothers: The impact of a nurturing curriculum on adolescent parenting skills and life hassles. Ambul Pediatr, 3, 240-245.
27. 27 Moderation and Mediation Analyses Test for moderation
Assess interaction between treatment and demographic/baseline risk variables
When interaction term is significant, stratify by levels of the moderator variable and re-run analyses for subgroups
Test for mediation
Standard z-test based on the multivariate delta standard error for the estimate of the mediated effect (MacKinnon, Lockwood, Hoffman, West, & Sheets, 2002; Sobel, 1982)
Treatment condition beta value is attenuated by 20% or more after controlling for proposed mediators (Baron & Kenny, 1986)
28. 28 An Example
29. 29 Intent-to-Treat Analysis Requires that all respondents initially enrolled in a given program condition be included in the first pass of an analysis strategy, regardless of whether respondents subsequently received program “treatment” (Hollis & Campbell, 1999)
Report findings from the intent-to-treat analysis
Important for peer-reviewed publication
Okay to re-run analyses, recoding respondents as not receiving the program or dropping them from analyses