Simplifying reading: Implications for instruction

Simplifying reading: Implications for instruction Janet Vousden University of Warwick Michelle Ellefson, Nick Chater, Jonathan Solity

Overview • English spelling-to-sound inconsistency and reading • rational analysis of English reading • applying the simplicity principle • analysis of some common reading programmes

Spelling-to-sound mappings • spelling-to-sound mappings in English are not transparent at sub-lexical level • some spellings are consistent: • “ck”: duck - /dʌk/, mock - /mok/, etc • and a simple grapheme-phoneme rule will suffice; • ck - /k/ • others are not: • “ea”: beach - /biːtʃ/, real - /rɪəl/, great - /ɡreɪt/, orhead - /hɛd/

e.g., round, group, should, four, country, tenuous, soul, journal, cough, pompous • most obvious at the grapheme level - “ou” grapheme is credited with having 10 different pronunciations (Gontijo, Gontijo, & Shillcock, 2003) • overall measure of (in)consistency in a language is its orthographic depth: average number of pronunciations per grapheme • for English, orthographic depth estimates • 2.1 - 2.4 (Berndt, Reggia, & Mitchum, 1987; Gontijo, Gontijo, & Shillcock, 2003) polysyllabic text • 1.7 (Vousden, 2008) monosyllabic text • compare e.g. Serbo-Croat which has OD of 1

how do literacy levels in English compare with other languages? • can differences in consistency account for the difficulty in learning to read English? • yes - inconsistency clearly increases difficulty of learning to read compared with more consistent languages(Frith, Wimmer & Landerl, 1998) Data: % correct reading scores (adapted from Seymour, Aro, & Erskine, 2003).

lag in performance persists through school years Data: non-word reading accuracy (reproduced from Frith, Wimmer, & Landerl, 1998)

Most often, vowel graphemes are inconsistent, but can use immediate context to resolve ambiguity • C V C - C V or V C • ambiguity can be resolved by considering the following consonant (a rime unit) rather than the previous consonant (Treiman et al., 1995) • ea • pronounced to rhyme with breath when followed by ‘d’ ~80% • pronounced to rhyme with meat when followed by ‘p’ 100% • also, rime units are more consistent than graphemes • 23% graphemes inconsistent • 15% rimes inconsistent

Choosing spelling-to-sound mappings • and many are inconsistent • 15% rimes, 23% graphemes • influences from developmental literature (do rimes or gpcs predict reading ability?) • variety of approaches from reading schemes (Rhymeworld, THRASS, etc) • so many to choose from, • ~2000 rime mappings • ~300 grapheme mappings

Rational analysis • Attempt to explain behaviour in terms of adaptation to environment, independent of details of cognitive architecture • Solution adopted by cognitive architecture should reflect structure of environment • e.g., Anderson & Schooler (1991) showed that the probability that a memory will be needed over time matches the availability of human memories • same factors that predict memory performance also predict the odds that an item will be needed • i.e. reliable effects of recency and frequency

factors that affect performance of skilled readers should be reflected in the statistical structure of the language, e.g. frequency and consistency effects of word frequency in naming and lexical decision effects of rime frequency on word-likeness judgements and pronunciation effects of grapheme frequency in letter search and word priming experiments • by examining linguistic factors that skilled readers have adapted to, could the input be more optimally structured for learners?

Analyses of spelling-to-sound mappings • rational analysis predicts the most frequent and consistent mappings best predict pronunciation • interested in the frequency & consistency of mappings at level of words, rimes, and graphemes, and their ability to predict correct pronunciation • CELEX database: 7,297 different monosyllabic words, 10,924,491 words in total

Words

Onsets and rimes • Exclude 100 most frequent words: • 7,197 diffrent words, total of 2,263,264 words • Create table of onset and rime mapping frequencies, remove all but most frequent of inconsistent mappings

Onsets Rimes

GPCs • exclude 100 most frequent words: • 7197 diffrent words, total of 2,263,264 words • create table of GPC mapping frequencies, remove all but most frequent of inconsistent mappings

GPCs

Summary • some words much more frequent than others, therefore sight vocabulary very effective for small number of words, up to ~100 • sub-lexical units also have skewed frequency distribution, and learning the most frequent mappings predicts high potential outcome • high initial gains with GPCs, greater overall gain with rimes in the long run • What is the optimal size unit to learn?

Potential benefits for reading outcome are larger for onset/rimes, but is this out-weighed by the cost of remembering many more mappings? • Can we measure the potential benefit from, and cost of, remembering mappings for • GPCs • onset/rimes • A combination of both ?

The Simplicity Principle • reading, like much high-level cognition, involves finding patterns in data, but many patterns are compatible with any finite set of data - so how does the cognitive system choose from the possibilities? • Using the simplicity principle, choose the simplest explanation of the data - intuitively, has long history (Occam’s razor) • can quantify simplicity by measuring (shortest) description from which data can be reconstructed - trade off brevity against goodness of fit • cognition as compression

implement with minimum description length (MDL) • more regularity = more compression • no regularity = no compression, just reproduce data • measure code length to specify: • hypothesis about data (mappings) • data, given hypothesis (decoding accuracy, given mappings) • can measure compression with Shannon’s (1948) coding theorem - more probable events are assigned shorter code lengths: length/bits = log2(1/p)

Method • determine mappings & frequencies from monosyllabic corpus of children’s reading materials (Stuart et al., 2003), for mapping sizes: • words • CV/C (head/coda) • C/VC (onset/rime) • GPCs • determine code length to describe mappings decoding accuracy, given mappings for each mapping size

Table 1. A list of reading schemes/series used by over a third of schools in the survey Name of scheme % using scheme Included in database? Ginn 360 74% Yes Storychest 58% Yes Magic Circle 58% Yes 1 2 3 and Away 50% Yes Griffin Pirates 43% Yes Breakthrough to Literacy 41% Bangers and Mash 40% Yes Wide range readers 38% Yes Dragon Pirates 37% Yes Through the rainbow 34% Ladybird read-it-yourself 33% Yes Humming birds 32% Thunder the dinosaur 29% Yes Link Up 29% Gay Way 27% Yes Monster 27% Yes Oxford Reading Tree 27% Yes Once Upon a Time 26% Yes Trog 26%

Code length for mappings length = log2(1/p(w)) + log2(1/p(iː)) + log2(1/p(newline)) length = log2(1/p(b)) + log2(1/p(i)) + log2(1/p(space)) + log2(1/p(b)) + log2(1/p(I)) + log2(1/p(newline))

Code length for decoding accuracy apply letter-to-sound rules to produce a list of pronunciations bread breId bri:d brɛd arrange in rank order of most probable (computed from letter-to-sound frequencies) & note rank of correct pronunciation bread bri:d brɛd breId code length for data, given hypothesis = log2(1/p(rank=2))

Simulations • overall comparison between different unit sizes for whole vocabulary • how does code length vary as a function of size of vocabulary for each unit size? • optimize number of mappings by removing those that reduce total code length • compare different reading schemes

Comparing different unit sizes for whole vocabulary

Code length as a function of vocabulary size

Optimizing number of mappings GPCs: Description length reduced by removing mainly inconsistent, low frequency mappings

Comparing different reading schemes

Decoding accuracy by scheme

Data: from Shapiro & Solity (2008) • ERR implemented as a reading intervention in 12 Essex schools: increase in reading scores significantly greater for ERR schools

small amount of sight vocabulary accounts for large proportion of text, but only small vocabularies most simply described by whole words • Complements recent work by Treiman and colleagues that shows children learn better when association between sound and print is non-arbitrary Some conclusions • As a homogenous set, GPCs provide a simpler explanation of the data • choosing the best set could be important

Simplifying reading: Implications for instruction