Testing and Spacing: Keys to Enhancing Learning and Retention

Testing and Spacing: Keys to Enhancing Learning and Retention Sean Kang Department of Psychology, UCSD TDLC Bootcamp Aug 10, 2009

Purpose of Tests / Quizzes • Traditionally, an assessment tool • But testing does not merely measure the contents of memory • Taking a test can serve as a learning opportunity, enhancing memory retention to a greater extent than additional studying… the testing effect (also referred to as retrieval practice)

Spitzer (1939) • 3,605 sixth-graders in Iowa • Students read ~600-word article on the bamboo plant • 25-item multiple-choice test (no feedback) • Varied the retention interval and frequency of testing

Spitzer (1939)

The Testing Effect Journal of Educational Psychology, 1989: Dempster, F. N. (1992). Using tests to promote learning: A neglected classroom resource. Journal of Research & Development in Education, 25, 213–217. Resurgence of interest in the testing effect in recent years

Roediger & Karpicke (2006) Stimuli: 2 prose passages from TOEFL prep book (~260 words each) Learning condition (within-subjects): Restudy (two 7-min periods of study) vs. Test (7-min period of study, followed by 7-min period of test) Retention interval (between-subjects): 5 min, 2 days, or 1 week

Past research has focused exclusively on verbal materials (or at least required verbal responses at test) Does testing benefit memory for non-verbal materials? Carpenter & Pashler (2007)

Roediger & Karpicke (2008) Stimuli: 40 Swahili-English word pairs Subjects studied and were tested on the Swahili words in alternating blocks

d = 4.03

Testing effect: How does it work? • Additional (focused) presentation of material • Operations/processes engaged by an initial test are also engaged during the final test, resulting in positive transfer to same type of tests (i.e., practice effect) • Retrieval itself is a potent memory modifier, with increasing retrieval demand/effort enhancing later retention

3) Retrieval “effort” 2) Transfer appropriate processing Final test Final test Does test format matter? Initial test type - Short Answer (SA), Multiple Choice (MC), Read Fact Final, criterial test (SA, MC) Corrective feedback given after each initial test question. COMPETING PREDICTIONS: 1) Repeated exposure

INTERVENINGEXPERIENCE ENCODING FINAL TEST Read 4 Current Directions articles ~15 mineach Multiple choice Mult. choice (16): 4 from each of the 4 prior conditions 3 days Short answer Read answer Short answer (16): 4 from each of the 4 prior conditions Control/filler Within-Subjects, after each article8 items/condition N=48 Procedure Feedback provided after each test question (Kang, McDermott, & Roediger, 2007)

Sample Test Question (E.g., after reading article on literacy acquisition by Rebecca Treiman) Read Fact: Young Joe is more likely to know the name of the letter ‘j’ than Alice or Tom. Short Answer: Young Joe is more likely to know the _______ of the letter ‘j’ than Alice or Tom. Multiple-choice: a. place of articulation b. phoneme c. name d. sound

Testing enhanced later memory, and the enhancement was greater when the initial test format was short answer None Read statements MC SA 1 0.9 INITIAL TEST 0.8 0.7 0.6 0.5 Proportion Correct 0.4 .87 .83 .69 .94 0.3 0.2 .27 .46 .53 .57 0.1 0 FINAL MC FINAL SA

COMPETING PREDICTIONS: Transfer appropriate processing Retrieval “effort” Final Test Final Test

Does feedback matter? INTERVENINGEXPERIENCE ENCODING FINAL TEST Read 4 Current Directions articles ~15 mineach Multiple choice Mult. choice (16): 4 from each of the 4 prior conditions 3 days Short answer Read answer Short answer (16): 4 from each of the 4 prior conditions Control/filler Within-Subjects, after each article8 items/condition N=48 Feedback provided after each test question Feedback provided after each test question Feedback provided after each test question Feedback provided after each test question Feedback provided after each test question Feedback provided after each test question (Kang, McDermott, & Roediger, 2007)

Corrective feedback important, especially when initial test performance is not high Does feedback matter? None Read statements MC SA INITIAL TEST .74 .88 .87 .80 .33 .51 .62 .48 FINAL MC FINAL SA

The Testing Effect • Taking a test can be a potent learning event, often yielding better long-term retention than additional studying. • Testing benefits learning of a diverse range of materials, both verbal and nonverbal. • Repeated retrieval practice augments the benefit. • The size of the testing effect is modulated by test format & feedback • Tests requiring effortful retrieval are more effective at enhancing retention, implicating retrieval as a causal mechanism • To maximize the benefit of testing, feedback should be provided when initial test performance is low

The Spacing Effect Reviews are more effective when distributed or spaced out, rather than massed (with total time equated) One of the most robust phenomenon; observed with diverse range of materials / types of learning Ebbinghaus (1885): When learning to recite a list of 12 nonsense syllables, if 68 repetitions in one day, 7 repetitions required the next day to relearn. If 38 repetitions spread across 3 days, however, 6 repetitions required the following day to relearn. “…with any considerable number of repetitions a suitable distribution of them over a space of time is decidedly more advantageous than the massing of them at a single time.”

The Spacing Effect Inter-Study Interval (ISI) Or practise retrieving Spacing effect: Spaced > Massed Lag effect: Comparison of different levels of spacing

Theoretical accounts Deficient processing theory At short ISI, processing of 2nd presentation is deficient; less attention paid to an item that is relatively more familiar Encoding variability theory Item and its context stored at encoding; Context is assumed to undergo random drift; Average distance between any prior context and the current context will increase with passing of time; Likelihood of successful retrieval depends on the distance between context at test and context at encoding; As ISI increases, increased probability that test context will be similar to at least one of the study/encoding contexts

The Spacing Effect Is there an optimal ISI / gap? Does the answer depend on the RI?

< Cepeda et al. (2006)

The Spacing Effect For RI >= 1 day, is a 1-day ISI/gap sufficient to produce most/all of the benefit of spacing? Only a handful of studies provide multi-gap comparisons, with RI >= 1 day.

Cepeda et al. (2009), Experiment 1 N = 182 Stimuli: 40 Swahili-English word pairs ISI / Gap (between-subjects): 0, 1, 2, 4, 7, and 14 days RI: 10 days Procedure Session 1: All items presented for study once, followed by testing with feedback until all items successfully recalled 2x. Session 2: After appropriate gap, all items tested 2x with feedback. Session 3: After 10-day RI, final test.

Cepeda et al. (2009), Experiment 2 N = 161 Stimuli: 2 sets Obscure facts (e.g., Who invented snow golf? Rudyard Kipling) Photographs of not-well-known objects paired with facts E.g., Name this model, in which Amelia Earhart made her ill fated last flight. Lockheed Electra. ISI / Gap (between-subjects): 0, 1, 7, 28, 84, 168 RI: 168 days

Cepeda et al. (2009), Conclusions Spacing benefits observed with RIs > 1 week Gap/ISI had non-monotonic effects on final test performance; accuracy increased then decreased as gap increased. For sufficiently long RIs, optimal gap/ISI > 1 day.

Cepeda et al. (2008) Experiment conducted on the internet N = 1,354 26 different combinations of gaps and RIs Stimuli: 32 obscure facts Procedure Session 1: Learn 32 facts to criterion of one correct recall of each fact. Session 2: After appropriate gap, subjects tested 2x with feedback. Session 3: After appropriate RI, final test.

Cepeda et al. (2008), Conclusions For each RI, final performance initially increased with increasing gap, then fell as gap increased further. The effect of gap was very large: the optimal gap provided a 64% increase (averaged across RIs) in final recall, relative to the 0-day gap condition. As RI increases, the optimal gap also increases, but the ratio of optimal gap to RI should decline. Smaller costs associated with using gap that is longer than the optimal value than using gap that is shorter.

Expanding vs. Equal Interval Spaced Retrieval

Expanding vs. Equal Interval Spaced Retrieval • Landauer & Bjork (1978) demonstrated the advantage of expanding over equal interval retrieval practice. • But findings since then have been rather inconsistent, with several instances of failures to replicate. E.g., Karpicke & Roediger (2007)

Applications of Testing & Spacing • Supermemo www.supermemo.com • Spaced Ed www.spaceded.com

Testing and Spacing: Keys to Enhancing Learning and Retention