An Empirical Study to Examine Whether Monetary Incentives Improve 12 th Grade Reading Performance

An Empirical Study to Examine Whether Monetary Incentives Improve 12th Grade Reading Performance Henry Braun Irwin Kirsch Kentaro Yamamoto Boston College ETS ETS Presented at the PDII Conference Princeton, NJ October 3, 2008

What is NAEP? Why this study? What were the design criteria? How were they operationalized? Do monetary incentives make a difference? If so, which ones, how much and for whom? How robust are the findings? What are the implications? Overview

Large-scale national/state surveys of academic achievement begun in 1969 Tests students in grades 4, 8, 12 Subjects: Reading, Mathematics, Science, Geography, Civics, History, etc. NAEP (“The Nation’s Report Card) provides a snapshot of student achievement overall, by state and by various subgroups The National Assessment of Educational Progress

National sample only Lower participation rates than grades 4 and 8 Concerns about levels of motivation/effort Undergoing expansion to state level 12th Grade NAEP

Increasing reliance on low-stakes large-scale assessment surveys for education policy Issues relate to both national and international LSASs In the U.S., NAEP is the only source of nationally comparable data on student achievement that can be used for state-level comparisons Under NCLB, 4th and 8th grade NAEP play an expanded role in monitoring state-level results Strong interest in expanding role of 12th grade NAEP National Commission on 12th grade NAEP (2004) recommendations Redesign to report on student readiness Expand to state level Increase participation and motivation Study Rationale

Goal: To estimate the effects of different monetary incentives on student performance on 12th grade NAEP Internal validity External validity Adequate power Design Criteria

Experiments Focus on mathematics O’Neil et al (NAEP items) Baumert et al (PISA items) Psychology Intrinsic vs. extrinsic motivation Behavioral Economics Monetary incentives can work Participants must be cognizant of incentives Literature

Focus on NAEP reading Randomized trial for internal validity Prepared detailed implementation protocol Employed experienced administrative staff External validity (i.e., link directly to NAEP) Used released NAEP materials Followed NAEP administrative and data processing procedures Carried out NAEP-like psychometric and statistical analyses Heterogeneous school sample Large study for sufficient power to detect effects Study Features

Control: Standard NAEP instructions Incentive 1: Standard NAEP instructions + Promise of a $20 gift card at conclusion of session Incentive 2: Standard NAEP instructions + $5 Gift card + $15 for a correct answer to each of two questions to be chosen at random at the conclusion of session Study Design: Incentives

All students in both incentive conditions were asked to select Target or Barnes & Noble for the gift card and to indicate their preference on a sign-up sheet Students in all three conditions actually received $35 gift cards at the end of the sessions Students were informally debriefed before leaving Study Design: Incentives (2)

Mapping to the NAEP Reading Framework (3 contexts) * Reading for literary experience (35%) * Reading for information (45%) Reading to perform a task (20%) Assembling test booklets 2 reading blocks + background questionnaire Each reading block consists of a passage and a set of associated questions Each block is expected to take 25 minutes Blocks vary with respect to the total number of questions and the proportions of multiple choice, short answer and extended response questions Study Design: Instrumentation

Booklet Design

Items drawn from operational questionnaire Two sets of items Set I Demographics and parental education Home environment School absences Set II Reading practices Future educational expectations Level of effort Survey Design: Background Questionnaire

Power analysis indicated need for a sample of 60 schools with 60 students per school (20 per condition in each school) Worked with NAEP state coordinators and Westat to obtain a (final) convenience sample of 59 schools Student recruitment was carried out using standard NAEP methods (but no special incentives) Number of participating students was lower than target Study Design: Sample Selection

Student Response Rates by State

Random samples of 12th graders invited to participate In each school students randomly allocated to the three conditions Fall (not spring) administration Sessions in a school were simultaneous or consecutive to eliminate possibility of contamination Limited accommodations No make-up sessions Administration

Student Response Rates by Condition

Scoring was conducted by NCS/Pearson Preliminary item analysis held no surprises: Differences by condition in Proportions correct Percentage of omitted items Highest for extended CR items Percentage of off-task responses Generally very small (<< 1%) Percentage of items not reached Particularly high for last CR item Data Preparation: Scoring and Item Analysis

Average Item Proportions Correct by Item TYPE and Incentive Condition

Scaling by subscale Fit item characteristic curves to data Compare to archival results Estimate three-group model Reasonable fit Conditioning Combine cognitive data with ancillary data from questionnaires Obtain posterior score distribution for each student Generate “plausible values” Linking Linear transformation to the NAEP scale Construct composite reporting scale Data Preparation: Scaling, Conditioning and Linking

Effect Sizes by Subscale, Item parameters Based on Study Data Only Effect Sizes by Subscale, Item parameters Based on Archival Data

Effects of incentives range from 3 to 5 points on the NAEP scale overall Male-female differences relatively stable White-Black and White-Hispanic differences grow somewhat larger under incentives Effects of incentives generally positive for subgroups Estimates reasonably robust Selected Results

Comparison of Effects by Incentive Condition

Study Statistics by Incentive and Gender 5 5 5 5 6 6

Study Statistics by Incentive and Race/Ethnicity 6 2 3 20 28 24

Study Statistics by Condition, Gender, and Race/Ethnicity

Study Statistics by Condition, Gender, and Mother’s Education Level

Study Statistics by Condition, Gender, and Number of Days Absent From School Last Month

Study Statistics by Condition, Gender, and Frequency of Reading for Fun on Own Time

Although treatment groups were determined randomly, there were differences in various characteristics that might have contributed to the estimated treatment effects. We ran an ANOVA adjusting NAEP scores for a number of demographic and home environment characteristics, as well as students’ reading habits. The ANOVAs were run separately for males and females. They yield adjusted least squares means that can be compared to the raw means. Sensitivity Analysis (1)

Impact of “leverage” groups was examined by identifying those subgroups with the largest positive effect (Incentive 2) and a large enough sample size to rule out sampling fluctuations. (i) Male, White, Absent more than 3 days in the last month [Effect ~3x larger than overall effect for males] [95/802] Removing this group would reduce effect of Incentive 2 by ~25%. (ii) Female, Hispanic, Not ELL [Effect ~3x larger than overall effect for females] [82/919] Removing this group would reduce effect of Incentive 2 by ~13%. Sensitivity Analysis (2)

Summary • Data clearly indicate that the design criteria for this • study were met • Monetary incentives improve NAEP reading performance • Type of incentive makes a difference • by reporting subgroup • by quantile

Caveats • Fall rather than Spring administration • Represented two of the three NAEP subscales • Lower student participation rate than in operational NAEP • Subgroup sample size • Relationship of the sample to the NAEP population

Implications • 12th grade NAEP results should be interpreted cautiously • Expansion of 12th grade NAEP ought to wait on policy • action on incentives • Measuring reading as NAEP does may be problematic in the • current context • In modifying NAEP cognitive instruments (e.g. for readiness), • the administrative setting should be taken into account

An Empirical Study to Examine Whether Monetary Incentives Improve 12 th Grade Reading Performance

An Empirical Study to Examine Whether Monetary Incentives Improve 12 th Grade Reading Performance

Presentation Transcript

6 th Grade Reading

12 th Grade

12 th Grade enrollment

12 th Grade

Steve Feiden 12 th Grade English/Reading

12 th Grade Summer Reading Assignment

12 th Grade

5 th Grade Reading

Using financial incentives to improve health system performance

12 th grade english

TO 7 th GRADE READING SUPPORT STUDY SKILLS

5 th Grade Reading

12 th grade DGP

12 th Grade Expository Reading and Writing Course

4 th Grade Reading

An Empirical Study of RealVideo Performance Across the Internet

4 th Grade Reading

An Empirical Study of UHF RFID Performance

4 th Grade Reading

12 th Grade Program

12 th Grade DOLs