Assessing Interventions and Control Conditions in RCTs: Concepts and Methods

Assessing Interventions and Control Conditions in RCTs: Concepts and Methods David S. Cordray, PhD Vanderbilt University Presentation for the IES/NCER Summer Research Training Institute: Cluster-Randomized Trials Northwestern University Evanston, Illinois July 7, 2008

Overview • Fidelity of Intervention Implementation: Definitions and distinctions • Conceptual foundation for assessing fidelity in RCTs, a special case. • Model-based assessment of implementation fidelity • Models of change • “Logic model” • Program model • Research model • Indexing Fidelity • Methods of Assessment • Sample-based fidelity assessment • Regression-based fidelity assessment • Questions and discussion

Intervention Fidelity: Definitions, Distinctions, and Some Examples

Dimensions Intervention Fidelity • Little consensus on what is meant by the term “intervention fidelity”. • But Dane & Schneider (1998) identify 5 aspects: • Adherence/compliance– program components are delivered/used/received, as prescribed; • Exposure – amount of program content delivered/received by participants; • Quality of the delivery – theory-based ideal in terms of processes and content; • Participant responsiveness – engagement of the participants; and • Program differentiation – unique features of the intervention are distinguishable from other programs (including the counterfactual)

Distinguishing Implementation Assessment from Implementation Fidelity Assessment • Two models of intervention implementation, based on: • A purely descriptive model • Answering the question “What transpired as the intervention was put in place (implemented). • An a priori intervention model, with explicit expectations about implementation of core program components. • Fidelity is the extent to which the realized intervention (tTx) is “faithful” to the pre-stated intervention model (TTx) • Fidelity = TTx – tTx • We emphasize this model, but both are important

Some Examples • The following examples are from an 8-year, NSF-supported project involving biomedical engineering education at Vanderbilt, Northwestern, Texas, Harvard/MIT (VaNTH, Thomas Harris, MD, PhD, Director) • The goal was to change the curriculum to incorporate principles of “How People Learn” (Bransford et al. and the National Academy of Sciences, 1999). • We’ll start with a descriptive question, move to model-based examples.

Descriptive Assessment: Expectations about Organizational Change From: Cordray, Pion & Harris, 2008

Macro-Implementation From: Cordray, Pion & Harris, 2008

Changes in Learning Orientation From: Cordray, Pion & Harris, 2008

Model Based Fidelity Assessment: What to Measure? • Adherence to the intervention model: • (1) Essential or core components (activities, processes); • (2) Necessary, but not unique to the theory/model, activities, processes and structures (supporting the essential components of T); and • (3) Ordinary features of the setting (shared with the counterfactual groups (C) • Essential/core and Necessary components are priority parts of fidelity assessment.

An Example of Core Components” Bransford’s HPL Model of Learning and Instruction • John Bransford et al. (1999) postulate that a strong learning environment entails a combination of: • Knowledge-centered; • Learner-centered; • Assessment-centered; and • Community-centered components. • Alene Harris developed an observation system (the VOS) that registered novel (components above) and traditional pedagogy in classes. • The next slide focuses on the prevalence of Bransford’s recommended pedagogy.

Challenge-based Instruction in HPL-based Intervention Courses: The VaNTH Observation System (VOS) Percentage of Course Time Using Challenge-based Instructional Strategies Adapted from Cox & Cordray, in press

Challenge-based Instruction in “Treatment” and Control Courses: The VaNTH Observation System (VOS) Percentage of Course Time Using Challenge-based Instructional Strategies Adapted from Cox & Cordray, in press

Student-based Ratings of HPL Instruction in HPL and non-HPL Courses We also examined the same question from the students point of view through surveys (n=1441): From: Cordray, Pion & Harris, 2008

Implications • Descriptive assessments involve: • Expectations • Multiple data sources • Can assist in explaining outcomes • Model-based assessments involve: • Benchmarks for success (e.g., the optimal fraction of time devoted to HPL-based instruction) • With comparative evidence, fidelity can be assessed even when there is no known benchmark (e.g., 10 Commandments) • In practice interventions can be a mixture of components with strong, weak or no benchmarks • Control conditions can include core intervention components due to: • Contamination • Business as usual (BAU) contains shared components, different levels • Similar theories, models of action • To index fidelity, we need to measure, at a minimum, intervention components within the control condition.

Conceptual Foundations for Fidelity Assessment within Cluster Randomized Controlled Trials

Linking Intervention Fidelity Assessment to Contemporary Models of Causality • Rubin’s Causal Model: • True causal effect of X is (YiTx – YiC) • RCT methodology is the best approximation to the true effect • Fidelity assessment within RCT-based causal analysis entails examining the difference between causal components in the intervention and counterfactual condition. • Differencing causal conditions can be characterized as “achieved relative strength” of the contrast. • Achieved Relative Strength (ARS) = tTx – tC • ARS is a default index of fidelity

Infidelity and Relevant Threats to Validity • Statistical Conclusion validity • Unreliability of Treatment Implementation (TTX-tTx) : Variations across participants in the delivery receipt of the causal variable (e.g., treatment). Increases error and reduces the size of the effect; decreases chances of detecting covariation. • Construct Validity – cause [(TTx – tTx) –(TC-tC)] • Forms of Contamination: • Compensatory Rivalry: Members of the control condition attempt to out-perform the participants in the intervention condition (The classic example is the “John Henry Effect”). • Treatment Diffusion: The essential elements of the treatment group are found in the other conditions (to varying degrees). • External validity – generalization is about (tTx-tC) • Variation across settings, cohort by treatment interactions

Treatment Strength Outcome .45 .40 .35 .30 .25 .20 .15 .10 .05 .00 TTx 100 90 85 80 75 70 65 60 55 50 Infidelity t tx (85)-(70) = 15 txC “Infidelity” TC Achieved Relative Strength =.15 ExpectedRelative Strength =.25

In Practice…. • Identify core components in both groups • e.g., via a Model of Change • Establish bench marks for TTX and TC; • Measure core components to derive tTx and tC • e.g., via a “Logic model” based on Model of Change • Research methods • With multiple components and multiple methods of assessment; achieved relative strength needs to be: • Standardized indices of fidelity • Absolute • Average • Binary • Converted to Achieved Relative Strength, and • Combined across: • Multiple indicators • Multiple components • Multiple levels (HLM-wise)

Indexing Fidelity Absolute • Compare observed fidelity (tTx) to absolute or maximum level of fidelity (TTx) Average • Mean levels of observed fidelity (tTx and tC) Binary • Yes/No treatment receipt based on fidelity scores (both groups) • Requires selection of cut-off value

Indexing Fidelity as Achieved Relative Strength Intervention Strength = Treatment – Control Achieved Relative Strength (ARS) Index • Standardized difference in fidelity index across Tx and C • Based on Hedges’ g (Hedges, 2007) • Corrected for clustering in the classroom

Average ARS Index Group Difference Sample Size Adjustment Clustering Adjustment Where, = mean for group 1 (tTx ) = mean for group 2 (tC) ST = pooled within groups standard deviation nTx = treatment sample size nC = control sample size n = average cluster size p = Intra-class correlation (ICC) N = total sample size

Example –The Measuring Academic Progress (MAP) RCT • The Northwest Evaluation Association (NWEA) developed the Measures of Academic Progress (MAP) program to enhance student achievement • Used in 2000+ school districts, 17,500 schools • No evidence of efficacy or effectiveness • The upcoming example presents heuristics for translating conceptual variables into operational form.

MAP’s Simple Model of Change Feedback Professional Development Achievement Differentiated Instruction

Conceptual Model for the Measuring Academic Progress (MAP) Program

Operational Intervention Model: MAP

Final RCT Design: 2-Year Wait Control

Translating Model of Change into Activities: the “Logic Model” From: W.T. Kellogg Foundation, 2004

Moving from Logic Model Components to Measurement The MAP Model: Feedback Achievement Professional Development Differentiated Instruction Resources: 3 Computer Adaptive Testing DesCarte system Activities: Grouping of students Continuous assessment Resources: Four training sessions On-line resources Outcomes & Measures Attendance Knowledge Acquisition Outcomes & Measures Testing completed Access DesCarte Outcomes & Measures Changes in pedagogy Outcomes & Measures State tests MAP assessments

Fidelity Assessment Plan for the MAP Program

Measuring Resources, Activities and Outputs • Observations • Structured • Unstructured • Interviews • Structured • Unstructured • Surveys • Existing scales/instruments • Teacher Logs • Administrative Records

Sampling Strategies • Census • Sampling • Probabilistic • Persons (units) • Institutions • Time • Non-probability • Modal instance • Heterogeneity • Key events

Key Points and Future Issues • Identifying and measuring, at a minimum, should include model-based core and necessary components; • Collaborations among researchers, developers and implementers is essential for specifying: • Intervention models; • Core and essential components; • Benchmarks for TTx (e.g., an educationally meaningful dose; what level of X is needed to instigate change); and • Tolerable adaptation

Points and Issues • Fidelity assessment serves two roles: • Average causal difference between conditions; and • Using fidelity measures to assess the effects of variation in implementation on outcomes. • Should minimize “infidelity” and weak ARS: • Pre-experimental assessment of TTx in the counterfactual condition…Is TTx > TC? • Build operational models with positive implementation drivers • Post-experimental (re)specification of the intervention: For example …..

Infidelity Augmentation of Control Intervention and Control Components PD= Professional Development Asmt=Formative Assessment Diff Inst= Differentiated Instruction

Questions and Discussion

Small Group Projects

Overview • Logistics: • Rationale for the group project • Group assignments • Resources • ExpERT (Experimental Education Research Training) Fellows • Parameters for the group project • Small group discussions

Rationale for the Project • Rationale for the group projects: • Purpose of this training is to enhance skills in planning, executing and reporting cluster RCTs. • Various components of RCTs are, by necessity, presented serially. • The ultimate design for an RCT is the product of: • Tailoring of design, measurement, and analytic strategies to a given problem. • Successive iterations as we attempt to optimize all features of the design. • The project will provide a chance to engage in these practices, with guidance from your colleagues.

About Group Assignments…. • We are assuming that RCTs need to be grounded in specific topical areas. • There is a diversity of topical interests represented. • The group assignments may not be optimal. • To manage the guidance and reporting functions we need to have a small number of groups.

Resources • ExpERT Fellows: • Laura Williams – Quantitative Methods and Evaluation • Chuck Munter – Teaching and Learning • David Stuit – Leadership and Policy

Parameters of the Proposal • IES goal is to support research that contributes to the solution of education problems. • RFA IES-NCER-2008-1 provides extensive information about the proposal application and review process. • Proposals are reviewed in 4 areas: • Significance • Research Plan • Personnel • Resources • For our purposes, we’ll focus on Significance and the Research Plan.

Significance

Research Plan

Awards/Duration

Group Project Report • Each group will present its proposal on Thursday (60 minutes each • 45 minutes for the proposal • 15 minutes for discussion • Ideally, each report will contain: • Problem statement, intervention description, rationale for why it should work (10-15 minutes) • Overview of the research plan • Samples • Groups and Assignment • Power • Fidelity Assessment • Outcomes • Impact Analysis Plan • Use tables, figures and bullet points in your presentation

Expectations • You will produce a rough plan • Some details will be guesses • The planning processes is often iterative, with the need to revisit earlier steps and specifications. • Flexibility helps….

Initial Group Interactions • Meet with your assigned group (45 minutes) to assess “common ground” • Group discussion of “common issues”

Assessing Interventions and Control Conditions in RCTs: Concepts and Methods