Missing Data Issues in RCTs: What to Do When Data Are Missing?

Missing Data Issues in RCTs: What to Do When Data Are Missing? REL Directors Meeting February 8, 2008 Analytic and Technical Support for Advancing Education Evaluations

Purpose: To provide guidance on how to deal with missing baseline and longitudinal follow-up data (including attrition from data collection) Audience: Researchers measuring the impact of an educational intervention on student achievement using a randomized control trial (RCT) design Of particular benefit to study teams just entering the impact analysis phase of their projects What Are We Doing?

Scope of the Monograph • Restricted to Missing Data Issues in RCTs: • Missing data on outcome variables, control variables, and subgroup variables • Excludes quasi-experimental studies, does not address attrition from the intervention • Focus on Analysis Strategies—We take the design and study sample as already fixed • Provide Practical Guidance—Intent is to provide solutions that RELs can actually implement (not cutting edge or overly costly methods)

Today’s Meeting • Status of the Monograph: • We are just writing the first draft, due March 1st • No solutions today – just an informal discussion • What do we want to accomplish today? • Let you know where the paper is heading • Get your feedback • What’s not here? • What help do you need?

General Approach to ITT Impact Estimation in the REL Studies • The RCTs are multi-site cluster randomized controlled trials with random assignment at the classroom or school level • Given this design, RELs are using 2- or 3-level HLM models to estimate impacts, e.g., with 3 levels, students at level 1, classrooms at level 2, and schools at level 3 • Covariates are usually included at the student level and sometimes at the class or school level

Data Used in RCTs Conducted by the RELs • Outcome Measures: • Most common outcomes are student test score measures • But other outcomes include student attitudes, teacher practice, and teacher knowledge • Baseline Measures: • Most RCTs are collecting baseline data to define subgroups and improve the precision of the impact estimates • These data include pre-intervention test scores and demographic characteristics, e.g., age, gender, race, ethnicity, Limited English Proficiency, and eligibility for Free or Reduced Price Lunches

Most Common Types of Data Collection for Creating Outcome Measures • Student tests conducted for the study (n = 20) • Teacher surveys (n = 16) • Student tests required by the state (n = 11) • Student surveys (n = 6) • Teacher tests (n = 5)

Missing Data and Measurement Issues • Missing Data (focus of this presentation): • Studies are likely to encounter missing values for one or more outcome variables, subgroup variables, or control variables • Other Measurement Issues (included in paper): • Baseline measures collected after random assignment • Outcomes measures collected over time and perhaps on average later for one experimental group than the other • Others? Pooling state test scores across states?

Student Tests for the Study • Reasons for Missing Data: • The student’s parent did not give consent • The student failed to attend school on testing day • The student transferred to another school • The student’s classroom was unavailable for testing (fire drill) • Student refused to do the test • Missing data rates should be low if tests are administered in the usual classroom setting (most will be)

Student Tests Required by the State • Reasons for Missing Data: • The student was absent and there was no follow-up testing • The student was exempt from taking the state test • The student’s school failed to provide test score data • The student transferred to another school • Missing data rates should be low since data can be collected at district or state level, and only a small fraction of the sample will transfer outside of the district

Student Surveys • Reasons for Missing Data: • The student’s parent did not give consent • The student’s teacher failed to administer the survey • The student did not attend school on the day of the survey • The student chose not to complete the survey • The student transferred to another school • Missing data rates should be low if the survey is administered in school, could be high if the survey is administered outside of school

Teacher Surveys • Reasons for Missing Data: • The teacher refused to complete the survey • The teacher was on temporary leave • The teacher left the school and never received the survey • Missing data rates can be high because teachers are busy and may have little incentive to complete the survey (unless required by the school or district)

Teacher Tests • Some REL RCTs are testing teachers on their knowledge using tests conducted online or in school • Reasons for Missing Data: • The teacher refused to complete the test • The teacher was on temporary leave • The teacher left the school and never received the test • Missing data rates can be high because teachers may find such test offensive or onerous (unless required by the school or district)

Why Worry About Missing Data? • Standard software typically drop the cases with missing variables, but this can lead to biased impact estimates • To avoid this problem, researchers may choose to drop the variables for which some cases are missing, but this can have negative consequences

Dropping Cases with Missing Data • Non-Response Bias—Dropping cases with missing data can lead to “non-response” bias if there is a relationship between the outcome and “missingness” • e.g., if student achievement is lower for students exempted from the state test) • Biased Impact Estimates Bias—Dropping cases with missing data can lead to biased impact estimates if the rate of missing data or the mechanism behind the “missingness” differs between treatment and control (give example)

Dropping Variables from the Analysis • Dropping either outcome variables (dependent variables) or subgroup variables because the data are missing for some cases is like throwing the baby out with the bath water! • Dropping control variables (independent variables) because data are missing will reduce the precision of the impact estimates (since controls are included to increase precision)

Assessing the Size of the Problem • Learn why certain data are missing—In some cases this may shed light on whether non-response bias is likely to be severe • Compare respondents to nonrespondents using data available for both—For student outcomes, it is especially important to compare the two groups in their pre-intervention test scores

Addressing the Problem: The Toolbox • All these methods rely on the information that we do have for the sample, but they vary in their assumptions and technical approach • Some methods work only for baseline covariates, some work only for outcome measures, and some work for both

Different Methods for Baseline Covariates and Outcome Variables • Baseline Covariates (Control Variables) Only: • Dummy variable indicators for cases with missing data • Outcome Variables Only: • Weighting methods—Re-weight respondents to better represent the population of interest (e.g., weight “up” groups with high rates of missing data) • Bounding the impacts—Make assumptions about nonrespondents that maximize and minimize the estimated impacts (e.g., make best and worst case assumptions for the true values of the outcome when data are missing)

Methods for Both Types of Variables • Imputation-based Procedures: • Mean value imputation—Replace missing values with the mean for the variable • Regression imputation—Replace missing values with a predicted value from regression model • Stochastic regression imputation—Adds residual to predicted value to maintain correct variance. Stochastic regression imputation can be implemented as a single imputation or a multiple imputation • Model-based Procedures—This is a broad class of procedures that includes maximum likelihood based methods, the EM algorithm, and pattern-mixture models (we are still investigating)

How Do We Plan To Recommend Strategies? • Our current thinking is to assess alternatives on the basis of four criteria: • Accessibility—Is this “tool” something that be done with standard software? • Bias reduction—How effective is the method likely to be in addressing non-response bias? • Correct inference—Will the tool produce standard errors that are at least “not too biased”? • Power—Does this method generate estimates that are reasonable precise (relative to alternative options)?

Discussion • Time for your feedback: • What’s not here? • What help do you need?

Missing Data Issues in RCTs: What to Do When Data Are Missing?