580 likes | 745 Views
Standards of Evidence for Program Efficacy, Effectiveness and Readiness for Dissemination Society for Prevention Research (SPR). Brian R. Flay, D.Phil. Oregon State University For the SPR Standards Committee. Three Main Sections to Talk. Presentation of The SPR Standards
E N D
Standards of Evidence for Program Efficacy, Effectiveness and Readiness for DisseminationSociety for Prevention Research (SPR) Brian R. Flay, D.Phil. Oregon State University For the SPR Standards Committee
Three Main Sections to Talk • Presentation of The SPR Standards • Elements of an Ideal Trial • Discussion of Particular Methodological Issues
Members of the SPR Committee • Brian R. Flay (Chair), D. Phil., U of Illinois at Chicago • Anthony Biglan, Ph.D., Oregon Research Institute • Robert F. Boruch, Ph.D., U of Pennsylvania • Felipe G. Castro, Ph.D., MPH, Arizona State U • Denise Gottfredson, Ph.D., Maryland U • Sheppard Kellam, M.D., AIR • Eve K. Moscicki, Sc.D., MPH, NIMH • Steven Schinke, Ph.D., Columbia U • Jeff Valentine, Ph.D., Duke University • With help from Peter Ji, Ph.D., U of Illinois at Chicago
Pressure for programs of “proven effectiveness” • The Federal Government increasingly requires that Federal money be spent only on programs of “proven effectiveness” • Substance Abuse and Mental Health Services Administration (SAMHSA) and Center for Substance Abuse Prevention (CSAP) • U.S. Department of Education (DE) • Office of Juvenile Justice and Delinquency Prevention (OJJDP)
What is “Proven Effectiveness”? • Requires rigorous research methods to determine that observed effects were caused by the program being tested, rather than some other cause. • Randomized controlled trials (RCTs) are the gold standard. • E.g., the Food and Drug Administration (FDA) requires at least two randomized clinical trials before approving a new drug. • Randomized controlled trials (RCTs) are always expensive, and there are many challenges to conducting RCTs in schools.
Multiple Approaches to Standards of Evidence • Each government agency and academic group that has reviewed programs for lists has come up with its own set of standards. • Some examples: • CDC - Guide to Preventive Services • SAMHSA - National Registry of Evidence-based Programs and Practices • US Dept of Education - What Works Clearinghouse • Justice Department – Multiple lists • Many others ….
Why So Many Lists of Evidence-Based Programs and Practices? • They are all similar but not equal • E.g., CSAP allows more studies in than ED • All concern the rigor of the research • The Society for Prevention Research recently created standards for the field • Our innovation was to consider nested sets of standards for efficacy, effectiveness and readiness for dissemination
Four Kinds of Validity(Cook & Campbell, 1979; Shadish, Cook & Campbell, 2002) • Construct validity • Program description and measures of outcomes • Internal validity • Was the intervention the cause of the change in the outcomes? • External validity (Generalizability) • Was the intervention tested on relevant participants and in relevant settings? • Statistical validity • Can accurate effect sizes be derived from the study?
Standards for 3 Levels • Efficacy • What effects can the intervention have under ideal conditions? • Effectiveness • What effects does the intervention have under real-world conditions? • Dissemination • Is an effective intervention ready for broad application or distribution? • Desirable • Additional criteria that provide added value to evaluated interventions
Phases of Research in Prevention Program Development (Flay, 1986) • Basic Research • Hypothesis Development • Component Development and Pilot Studies • Prototype Studies of Complete Programs • Efficacy Trials of Refined Programs 6. Treatment Effectiveness Trials • Generalizability of effects under standardized delivery 7. Implementation Effectiveness Trials • Effectiveness with real-world variations in implementation 8. Demonstration Studies • Implementation and evaluation in multiple systems
Specificity of Efficacy Statement • “Program X is efficacious for producing Y outcomes for Z population.” • The program (or policy, treatment, strategy) is named and described • The outcomes for which proven outcomes are claimed are clearly stated • The population to which the claim can be generalized is clearly defined
Program Description • Efficacy • Intervention must be described at a level that would allow others to implement or replicate it • Effectiveness • Manuals, training and technical support must be available • The intervention should be delivered under the same kinds of conditions as one would expect in the real world • A clear theory of causal mechanisms should be stated • Clear statement of “for whom?” and “under what conditions?” the intervention is expected to work • Dissemination • Provider must have the ability to “go-to-scale”
Outcomes • ALL claimed public health or behavioral outcome(s) must be measured • Attitudes or intentions cannot substitute for actual behavior • At least one long-term follow-up • The appropriate interval may vary by type of intervention and state-of-the-field
Measures • Efficacy • Psychometrically sound • Valid • Reliable (internal consistency, test-retest or inter-rater reliability) • Data collectors independent of the intervention • Effectiveness • Level of exposure also should be measured • Integrity and level of implementation • Acceptance/compliance/adherence/involvement of target audience in the intervention • Dissemination • Monitoring and evaluation tools available
Desirable Measures • For ALL • Multiple measures • Mediating variables (or immediate effects) • Moderating variables • Potential side-effects • Potential iatrogenic effects
Design – for Causal Clarity • At least one comparison group • No-treatment, usual care, placebo or wait-list • Assignment to conditions must maximize causal clarity • Random assignment is “the gold standard” • Other acceptable designs • Multiple baseline or • Repeated time-series designs • Regression-discontinuity • Well-done matched controls • Demonstrated pretest equivalence on multiple measures • Known selection mechanism
Generalizability of Findings • Efficacy • Sample is defined • Who it is (from what “defined” population) • How it was obtained (sampling methods) • Effectiveness • Description of real-world target population and sampling methods • Degree of generalizability should be evaluated • Desirable • Subgroup analyses • Dosage studies/analyses • Replication with different populations • Replication with different program providers
Precision of Outcomes:Statistical Analysis • Statistical analysis allows unambiguous causal statements • At same level as randomization and includes all cases assigned to conditions • Tests for pretest differences • Adjustments for multiple comparisons • Analyses of (and adjustments for) attrition • Rates, patterns and types • Desirable • Report extent and patterns of missing data
Precision of Outcomes: Statistical Significance • Statistically significant effects • Results must be reported for all measured outcomes • Efficacy can be claimed only for constructs with a consistent pattern of statistically significant positive effects • There must be no statistically significant negative (iatrogenic) effects on important outcomes
Precision of Outcomes:Practical Value • Efficacy • Demonstrated practical significance in terms of public health (or other relevant) impact • Report of effects for at least one follow-up • Effectiveness • Report empirical evidence of practical importance • Dissemination • Clear cost information available • Desirable • Cost-effectiveness or cost-benefit analyses
Precision of Outcomes:Replication • Consistent findings from at least two different high-quality studies/replicates that meet all of the other criteria for efficacy and each of which has adequate statistical power • Flexibility may be required in the application of this standard in some substantive areas • When more than 2 studies are available, the preponderance of evidence must be consistent with that from the 2 most rigorous studies • Desirable • The more replications the better
Additional Desirable Criteriafor Dissemination • Organizations that choose to adopt a prevention program that barely or not quite meets all criteria should seriouslyconsider undertaking a replication study as part of the adoption effort so as to add to the body of knowledge. • A clear statement of the factors that are expected to assure the sustainability of the program once it is implemented.
Nested Standards 20 28 31 43 Efficacy EffectivenessDissemination Desirable
How you can use the Standards when Questioning Public Officials • Has the program been evaluated in a RCT? • Were units randomized to program and control (no program or alternative program) conditions? • Has the program been evaluated on populations like yours? • Have the findings been replicated? • Were any evaluations independent from the program developers?
School- or Community-Based Prevention/ Promotion Studies are Large and Complex • Large randomized trials • With multiple schools or other units per condition • Comparisons with “treatment as usual” • Measurement of implementation process and program integrity • Assessment of effects on presumed mediators • Helps test theories • Multiple measures/sources of data • Surveys of students, parents, teachers, staff, community • Teacher and parent reports of behavior • School records for behavior and achievement • Multiple, independent trials of promising programs • At both efficacy and effectiveness levels • Cost-effectiveness analyses
Example Programs that Come Close to Meeting The SPR Standards? • Life Skills Training (Botvin) • Multiple RCTs with different populations, implementers and types of training • Only one long-term follow-up • Independent replications of short-term effects are now appearing (as well as some failures) • No independent replications of long-term effects yet • Positive Action • Two high-quality matched control studies • Two randomized trials • Others?
Programs for Which the Research Meets the Standards, But Do Not Work • DARE • Many quasi-experimental and non-experimental studies suggested effectiveness • Multiple RCTs found no effects (Ennett meta-analysis • Hutchinson • Well-designed RCT • But no published information on the program or short-term effects – need to demonstrate short-term effects before you can say anything meaningful about long-term effects • But no long-term effects • Cannot be interpreted because of lack of information
Much of Prevention is STUCK … We’re Spinning our Wheels • At the Efficacy Trial phase • How can we get more programs into effectiveness trials? • At the Effectiveness Trial phase • How can we get more proven programs adopted? • At the “Model Program” phase • How can we ensure the ongoing effectiveness of model programs? • Lots more prevention research is needed – at all levels!
II. Example of a well-designed evaluationRCT: Randomized Controlled (Community) Trial Community-Based Tobacco Control Program
Comprehensive Intervention • Mass Media Campaign • Legacy “truth” Campaign • School • Curriculum • School-wide Policy • Family Involvement • Community Mobilization • Youth Access • Smoke-free environments • Other policy changes
RCT with Matched Pairs • 6-12 matched pairs (need power analysis) • Community = Defined Media Markets • Matched pairs or strata: • Use existing data to match • Population, smoking rates, media reach • Possible use of baseline data to match • Youth smoking survey
Randomization • Each member of each pair agrees to randomization after baseline data collected • OR • Match on basis of archival data, recruit to study, then randomize • OR • Randomize to conditions, then recruit
Design Elements O OxxxxOxxxxOxxxxOxxxxOxxxxO O R O O O O O O O O O Improve Power: Multiple baseline measures Multiple control groups per intervention group Multiple waves of data
Steps • Baseline studies • Tobacco use prevalence and survey • Qualitative eval of appropriateness of messages • Second or more baselines • Implement all components of intervention • Monitoring of intervention implementation • Follow-up surveys • Awareness of campaign components • Of intermediate changes - e.g., prices, availability • Youth tobacco use attitudes and behaviors
Specific Measures: Process • Pre-intervention measure of readiness • Implementation • Exposure to media campaign or number of lessons taught • Fidelity of implementation • # youth involved in empowerment programs • Community organizational structures/involvement • Post-study sustainability of intervention
Measures: Intermediate • Policy Change • School tobacco-free policies • Tobacco prices (and change in prices) • Community smoke-free policies • Enforcement of tobacco-free policies • School • Community
Specific Measures: Outcomes • Tobacco use by adolescents 13-15 • % ever used • % use currently • % of youth “susceptible” to initiation • Exposure to 2nd hand smoke • % children, nonsmoking adolescents/young adults • Per capita sales of tobacco products • Cigarettes • Other tobacco products
Appropriate Analyses • Take account of unit of assignment • Take account of nesting • times within subjects within settings • Take account of subject mobility • E.g., stayers, leavers and joiners • Take account of missing data • Make full use of longitudinal data • E.g., growth curves
III. Methodological Issues – Some More Detail • What would an idea trial look like? • Issues re Randomization • Sample sizes • Where are the control groups? • Intensive measurement • Unit of Analysis • Nature of the target population • Moderation and Mediation
Why Random Assignment of Schools • Intervention is delivered to intact classrooms • Random assignment of classes is subject to contamination across classes within schools • Program includes school-wide components • Credible causal statements require group equivalence at both the group and individual levels • On the outcome variable • On presumed mediating variables • On motivation or desire to change
Randomized Prevention Research Studies • First Waterloo Study was the first with a sufficient N of schools for randomization to be “real” • Some earlier studies were claimed as “randomized” with only one or two schools per condition • Other smoking prevention studies in early-mid 80’s • McAlister, Hansen/Evans, Murray/Luepker, Perry/Murray, Biglan/Ary, Dielman • Other substance abuse prevention studies in late 80’s and 90’s • Johnson/Hansen/Flay/Pentz, Botvin and colleagues • Extended to sexual behavior, AIDs, & Violence in 90’s • Also McArthur Network initiated trials of Comer • Character Education in 2002 • New DoE funding
Issues re Randomization • Ethical resistance to the idea of randomization needs to be addressed, though it’s becoming rare • Control schools like to have a program too • Use usual Health Education (treatment as usual) • Offer special, but unrelated, program • E.g., Aban Aya Health Enhancement Curriculum as control for Social Development (violence, sex, drug prevention) Curriculum • Pay schools for access to collect data from students, parents and teachers -- $500-$2,000 per year • Currently, because of NCLB demands, many schools are too busy to want to be in an intervention condition • Too many teaching and testing demands • Too many other special programs already • Pay them for staff support and special activities
Approaches to Randomization • Pure randomization from a large population • Obtain agreement first • Even prior agreements can break down (Waterloo) • Then randomize from matched sets defined by • Presumed predictors of the outcome (Graham et al., Aban Aya) • Actual predictors of the outcome (Hawaii Positive Action trial) • Individual-level pretest levels of the outcome (has anyone ever achieved this?) • If schools refuse or drop out, replace from the same set • Only one school of 15 initial selections/assignments for Aban Aya refused and was replaced • But this compromises randomization – different probabilities • Drop the set • We had to drop multiple sets in the Hawaii Positive Action trial because of refusals by schools assigned to the program
Breakdown of Randomization/Design • Failure of randomization • Don’t use posttest-only designs (to my knowledge none have) • Schools drop out during course of study • Use signed agreements (none dropped out of Aban Aya or Hawaii) • Configuration of schools is changed during course of study • E.g., A school is closed, two schools are combined • Drop the paired school as well (& add replacement set if it’s soon enough) • A program school refuses to deliver the program, or delivers it poorly • Try to avoid this with incentive and technical support • E.g., Schaps Child Development Study only had 5 schools of 12 implement the program well – and reports emphasize results from these 5. • Botvin often reported results only for students who received more than 60% of the lessons • “Intention to Treat” analysis should be reported first. Reporting results for the high-implementation group is appropriate only as a secondary level of analysis
Expense --> Small Ns? • Yes, in many cases • Average efficacy trial to date (where research funds support the intervention) had 4-8 schools per condition, and cost ~$500-900,000 per year. • Effectiveness trials (where intervention is less costly) have 10-20 schools per condition for $500,000 per year. • Limit costs by using more small schools • Raises questions about generalizability of results to large schools • Limit costs by limiting variability between schools • Also limits generalizability of results
The Changing Nature of Control Groups • The medical model suggests use of a placebo and double blinding, neither of which is possible for educational programs • Subjects (both students and schools) should have equal expectations of what they will get from the program • Few studies have used alternative programs to control for Hawthorne effect or student expectancies • TVSFP, Sussman, Aban Aya • It is not possible to have pure controls in schools today – they all have multiple programs • Must monitor other programs in both sets of schools
Implications of no blinding • Requires careful monitoring of program delivery • Assessment of acceptance of, involvement in, and expectations of program by target audience • Monitoring of what happens in control schools • Data collectors blinded to conditions • Or at least to comparisons being made • This condition has rarely been met in prevention research • Data collectors not known to students • To ensure greater confidentiality and more honest reports of behavior • Classroom teachers should not be present (or be unobtrusive) during student surveys • Use unobtrusive measures -- rarely used so far • Use of archival data and playground observations are possibilities • Though they have their own problems
Parental Consent Issues • Historical use of “passive” consent • Parents informed, but only respond if want to “opt out” their child or themselves • More and more IRBs are requiring active signed consent • When is active consent required? • If asking “sensitive” questions • Drug use, sexual behavior, illegal behavior, family relationships • If students are “required” to participate • Protection of Pupil Rights Act (PPRA) • Data are not anonymous (or totally confidential) • There is more than minimal risk if data become non-confidential • Thus, passive consent should be allowed if: • Not asking about sensitive issues • Allows surveys of young students (K-3/4) • Students not required to participate • By NIH rules, students already must be given the opportunity to opt out of complete surveys or to skip questions • Requires careful “assent” procedures • Strict non-disclosure protocols are followed • Multiple levels of ID numbers for tracking • No individual (or classroom or school)–level data are ever released
Changes in Student Body During a Study • Transfers out and in • Students who transfer out of or into a study school are, on average, at higher risk than other students • Are transfers out replaced by transfers in, or are rates different • Are rates the same across experimental conditions? • Absenteeism • Students with higher rates of absenteeism are also, on average, at higher risk than others • Are rates the same across experimental conditions? • Rates of transfers in/out, absenteeism, or dropout that are differential by condition present the most serious problem • Requires careful assessment and analysis • Missing data techniques of limited value when rates are differential because not MCAR • But may be useful for MAR (that is, if missing is predictable) • We do not follow students who leave study schools and we add students who enter during the study
Complex Interventions • Always thought of as curricula, or whole programs, not separate components • Few field-based tests of efficacy of separate components to date • But curricula/programs based on basic and hypothesis-driven research • Programs have grown more complex over the years • Multiple outcomes are the norm • Achievement + multiple Behaviors + Character (ABCs) • Also multiple ecologies are involved – moderators are likely • School-wide • Involvement of parents/families • Involvement of community (e.g., Aban Aya) • Therefore, multiple mediators, both distal and proximal • Distal: Family patterns, school climate, community involvement • Proximal: Attitudes, normative beliefs, self-efficacy, intentions