520 likes | 944 Views
Previous models of reading in English. Dual-route cascade (DRC) model (Coltheart, 2000; Coltheart, Rastle, Perry, Langdon,
E N D
1. Stressing what is important: Orthographic cues and Lexical Stress Assignment Nada Ševa
University of York, UK
Padraic Monaghan
Lancaster University, UK
Joanne Arciuli
Charles Hurst University, Australia
2. Previous models of reading in English Dual-route cascade (DRC) model
(Coltheart, 2000; Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001)
rule-based model (Grapheme-to-phoneme (GPC) rules for novel words)
Connectionist models
(Harm & Seidenberg, 1999, 2004; Plaut, McClelland, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1989)
-triangle model (Harm & Seidenberg, 2004) – interaction between orthography, phonology and semantics
Connectionist Dual Process (CDP+) model (Perry, Ziegler, & Zorzi, 2007)
3. Problems :
Only monosyllabic words
- There is only approx. 8500 monosyllabic words in English and over 50000 polysyllabic words
- Extension to other languages
Increased complexity in grapheme-to-phoneme coding in polysyllabic words
“hothouse”
Stress assignment
4. Stress and spoken words processing:
lexical access (Donselar et al., 2005; Soto-Franco et al., 2001);
the division of words into sub lexical units such as onset-rime (Goswami, 2003; Wood, 2006);
word, phrase, sentence boundaries (Cutler et al., 1997; Sebastian-Galles & Costa, 1997) ;
5. Stress and written words processing:
Stress sensitivity facilitate learning of reading (Wood & Terrel, 1998; Wood, 2006;) and stress assignment in second language learning (Wade-Woolley et al, 2004; Goetry et al, 2006)
Stress representation is activated during silent reading (Ashby & Clifton, 2005);
6.
Nature of the stress representation?
Current theories on word production state that lexical stress is a part of the metrical representation which is retrieved or computed parallel to phonological encoding (Caramazza, 1997; Levelt, Roelofs, & Meyer, 1999; Schiller, 2006).
Reading and stress assignment in languages with non-fixed stress placement (English, Dutch, Italian)?
English:
ZEbra (trochaic) vs. GiRAffe (iambic)
70% 30%
7.
Rastle & Coltheart (2000) model proposed a system of sub-lexical rules which will translate orthographic representation to both segmental and suprasegmental parts of phonological representation.
8. Rastle & Coltheart (2000) model
a) Represents part of the Dual-route Cascade (DRC) model of reading (Coltheart et al., 2001);
b) linguistic analysis of stress patterns in English by Fudge (1984) and Garde (1968):
54 beginnings and 101 endings (most of them were morphemes in English) could influence the placement of stress;
10. Correct stress assignment for 89.7% of English disyllabic words from the CELEX database (Baayen et al., 1993).
Nonwords test:
210- 115 trochaic and 95 iambic words
15 subjects estimated stress position in reading aloud task
-84.8% correct stress assignment on the non-word test.
11. Problems?
Is this really sublexical procedure given the role of affixes in the stress assignment process?
What is the role of orthography?
12.
Connectionist account?
13.
The statistical regularities with respect to stress assignment could be learned in the same way as the learning of regularities in the orthography to phonology mapping (Harm & Seidenberg, 1999, 2004; Plaut et al., 1996; Seidenberg McClelland, 1989).
14. Distributional cues
general (trochaic words more frequent)
nouns (trochaic) vs. verbs (iambic) (Kelly & Bock, 1988; Serano, 1986)
Phonological cues
the rime : reduced vowels are unstressed and consonantal clusters in codas are stressed (Chomsky & Halle, 1968)
the onset : consonantal clusters (Kelly, 2004).
Orthographic cues
length and complexity of beginnings and endings, the identity letters (both consonants and vowels) (Arciuli & Cupples, 2006, in press; Kelly, Morris & Verrekia, 1998).
15. Experimental studies have demonstrated that readers are sensitive to such phonological, orthographic and distributional cues present in the input (Arciuli & Cupples, 2006, in press; Colomobo, 1992; Kelly & Bock, 1988; Kelly et al., 1998;)
16. Corpus analyses of orthographic cues
17. Corpus analyses of orthographic cues Disyllabic words from CELEX
with distinct orthography and/or pronunciation and/or grammatical category count as separate words.
All words
18,571 1st syllable stress, 2387 2nd syllable stress
Lemma analyses (no inflectional morphology)
9485 1st syllable stress, 1813 2nd syllable stress
Monomorphemic analyses (no inflectional or derivational morphology)
2420 1st syllable stress, 375 2nd syllable stress
18. Analysis Discriminant analysis –
used to determine which variables discriminate between trochaic vs. iambic words.
Type and token analysis (weighted by frequency)
19. Beginnings and endings Beginning cue:
Orthography up to and including first vowel (as in Arciuli & Cupples, 2006)
789 distinct beginnings
Ending cue:
Orthography from final vowel onwards
1411 distinct endings
E.g.:
penguin: pe-, -uin
20. Results: All Words Type
21. Results: Lemmata Type
22. Results: Monomorphemes Type
23. The Educator’s Word Frequency Guide (Zeno, 1995).
a quantitative summary of the printed vocabulary encountered by students in American schools.
60,527 samples of text from over 6,000 textbooks, works of literature, and popular works of fiction and nonfiction.
from grade 1(age of 5) to college.
24. Results: Tokens
25. Educator’s WFL vs. Celex
26. There is a large amount of potential information in orthography beginnings/endings
That goes well beyond morphemes
Most beginnings/endings were not morphemes
For all analyses, better classification from endings than beginnings (more for children than for adult’s)
27. Modelling Architecture
28. 25016 English disyllabic words
CELEX lexical database (Baayen et al., 1993);
83% trochaic, 17% iambic
learning rate:0.005;
alignment: left;
5 million presentations of words, selected according to their log-compressed frequency;
20 simulations;
90% training, 10% testing, randomly selected
30. nouns vs. verbs ‘contrast as a noun versus con’trast as a verb
overgeneralization errors
ab- : a’bout, a’bove, a’broad (second syllable)
CELEX:
60 ab- (51897) 2nd syllable stress,
21 ab- (7708) 1st syllable stress
error: ‘abject.
evenly distributed errors
con-
CELEX:
101 con- (13008) 1st syllable stress,
169 con- (44292) 2nd syllable stress
errors: 38 1st syllable
44 2nd syllable stress
31. Test on Rastle & Coltheart (2000) nonwords?
32. R&C 2000 nonwords
33.
no-/-ate :
nonword nockate (second syllable)
CELEX:
104 no- (22077) 1st syllable stress,
15 no- (285) 2nd syllable stress
108 -ate (6565) 1st syllable stress,
165 -ate (3608) 2nd syllable stress
34. Why does R&C model exhibit better performance than neural networks?
Limited and non-representative training set for NN models
35. Training on all polysyllabic words with the stress on 1st or 2nd syllable
51948 words, 89.6% of the polysyllabic word types in the CELEX database.
68.6% 1st syllable and 31.4% second syllable words
(dysillabic words – 87% trochaic vs. 13% iambic words)
37. Why does R&C model exhibit better performance than neural networks?
Limited training set for NN models
- Explicitly define beginnings and endings
38. Kelly(2004) non-words:
96 non-words varying in onset complexity:
½ C onset - pamdeen
½ CC onset – plamdeen
78 trochaic vs.18 iambic words
20 subjects in silent reading task
39. Kelly2004 nonwords
40. Kelly2004 results
41. R&C(2000) model:
1/3 of errors were from the noprefix/nosuffix class of words
(bolay, wispay)
co- (colvane, corlax)
Conflicting cues (beginning vs. endings)
plamdeen, gronvoon
pl-, gr- (complex onset) – trochaic words
-een, -oon (suffix) – iambic words
42. Why does R&C model exhibit better performance than neural networks?
Limited training set for NN models
- Explicitly define beginnings and endings
Phonology and/or parts-of-speech information
44. Phonology and Parts-of-speech
45. Multiple cue accounts have been shown to result in more accurate classification in:
speech segmentation tasks
(Onnis, Monaghan, Chater, & Richmond, 2005);
grammatical categorisation tasks (Monaghan, Christiansen, & Chater, 2007).
47. Orthography, Phonology, Parts-of-speech
48.
What is the role of orthography?
Orthography and other cues?
Rule-based vs. connectionist account?
Sublexical nature of the stress assignment?
49. Conclusions The present study provided a demonstration that stress assignment for words and nonwords can be accomplished with good accuracy in a connectionist model that learns to map orthography onto stress position for disyllabic words in English.
Additional simulations indicated that combination of orthographical, phonological and distributional cues can give improved performance in the stress assignment task.
50. Rule-based vs. connectionist accounts:
Connectionist account allowed more detailed exploration of different cues relevant for the stress assignment;
Stress assignment is clearly part of sublexical process.
Conclusions
51. Further simulations Further testing on novel sets of nonwords, including phonological and distributional information;
Cross-linguistic comparison with Italian;
Simulations of the developmental results.
52. This work was supported by the ESRC/ARC Bilateral Research Awards Grant, RES 000-22-1975.
53.
Thank you!