1 / 52

Stressing what is important: Orthographic cues and Lexical Stress Assignment

Previous models of reading in English. Dual-route cascade (DRC) model (Coltheart, 2000; Coltheart, Rastle, Perry, Langdon,

dunne
Download Presentation

Stressing what is important: Orthographic cues and Lexical Stress Assignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Stressing what is important: Orthographic cues and Lexical Stress Assignment Nada Ševa University of York, UK Padraic Monaghan Lancaster University, UK Joanne Arciuli Charles Hurst University, Australia

    2. Previous models of reading in English Dual-route cascade (DRC) model (Coltheart, 2000; Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001) rule-based model (Grapheme-to-phoneme (GPC) rules for novel words) Connectionist models (Harm & Seidenberg, 1999, 2004; Plaut, McClelland, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1989) -triangle model (Harm & Seidenberg, 2004) – interaction between orthography, phonology and semantics Connectionist Dual Process (CDP+) model (Perry, Ziegler, & Zorzi, 2007)

    3. Problems : Only monosyllabic words - There is only approx. 8500 monosyllabic words in English and over 50000 polysyllabic words - Extension to other languages Increased complexity in grapheme-to-phoneme coding in polysyllabic words “hothouse” Stress assignment

    4. Stress and spoken words processing: lexical access (Donselar et al., 2005; Soto-Franco et al., 2001); the division of words into sub lexical units such as onset-rime (Goswami, 2003; Wood, 2006); word, phrase, sentence boundaries (Cutler et al., 1997; Sebastian-Galles & Costa, 1997) ;

    5. Stress and written words processing: Stress sensitivity facilitate learning of reading (Wood & Terrel, 1998; Wood, 2006;) and stress assignment in second language learning (Wade-Woolley et al, 2004; Goetry et al, 2006) Stress representation is activated during silent reading (Ashby & Clifton, 2005);

    6. Nature of the stress representation? Current theories on word production state that lexical stress is a part of the metrical representation which is retrieved or computed parallel to phonological encoding (Caramazza, 1997; Levelt, Roelofs, & Meyer, 1999; Schiller, 2006). Reading and stress assignment in languages with non-fixed stress placement (English, Dutch, Italian)? English: ZEbra (trochaic) vs. GiRAffe (iambic) 70% 30%

    7. Rastle & Coltheart (2000) model proposed a system of sub-lexical rules which will translate orthographic representation to both segmental and suprasegmental parts of phonological representation.

    8. Rastle & Coltheart (2000) model a) Represents part of the Dual-route Cascade (DRC) model of reading (Coltheart et al., 2001); b) linguistic analysis of stress patterns in English by Fudge (1984) and Garde (1968): 54 beginnings and 101 endings (most of them were morphemes in English) could influence the placement of stress;

    10. Correct stress assignment for 89.7% of English disyllabic words from the CELEX database (Baayen et al., 1993). Nonwords test: 210- 115 trochaic and 95 iambic words 15 subjects estimated stress position in reading aloud task -84.8% correct stress assignment on the non-word test.

    11. Problems? Is this really sublexical procedure given the role of affixes in the stress assignment process? What is the role of orthography?

    12. Connectionist account?

    13. The statistical regularities with respect to stress assignment could be learned in the same way as the learning of regularities in the orthography to phonology mapping (Harm & Seidenberg, 1999, 2004; Plaut et al., 1996; Seidenberg McClelland, 1989).

    14. Distributional cues general (trochaic words more frequent) nouns (trochaic) vs. verbs (iambic) (Kelly & Bock, 1988; Serano, 1986) Phonological cues the rime : reduced vowels are unstressed and consonantal clusters in codas are stressed (Chomsky & Halle, 1968) the onset : consonantal clusters (Kelly, 2004). Orthographic cues length and complexity of beginnings and endings, the identity letters (both consonants and vowels) (Arciuli & Cupples, 2006, in press; Kelly, Morris & Verrekia, 1998).

    15. Experimental studies have demonstrated that readers are sensitive to such phonological, orthographic and distributional cues present in the input (Arciuli & Cupples, 2006, in press; Colomobo, 1992; Kelly & Bock, 1988; Kelly et al., 1998;)

    16. Corpus analyses of orthographic cues

    17. Corpus analyses of orthographic cues Disyllabic words from CELEX with distinct orthography and/or pronunciation and/or grammatical category count as separate words. All words 18,571 1st syllable stress, 2387 2nd syllable stress Lemma analyses (no inflectional morphology) 9485 1st syllable stress, 1813 2nd syllable stress Monomorphemic analyses (no inflectional or derivational morphology) 2420 1st syllable stress, 375 2nd syllable stress

    18. Analysis Discriminant analysis – used to determine which variables discriminate between trochaic vs. iambic words. Type and token analysis (weighted by frequency)

    19. Beginnings and endings Beginning cue: Orthography up to and including first vowel (as in Arciuli & Cupples, 2006) 789 distinct beginnings Ending cue: Orthography from final vowel onwards 1411 distinct endings E.g.: penguin: pe-, -uin

    20. Results: All Words Type

    21. Results: Lemmata Type

    22. Results: Monomorphemes Type

    23. The Educator’s Word Frequency Guide (Zeno, 1995). a quantitative summary of the printed vocabulary encountered by students in American schools. 60,527 samples of text from over 6,000 textbooks, works of literature, and popular works of fiction and nonfiction. from grade 1(age of 5) to college.

    24. Results: Tokens

    25. Educator’s WFL vs. Celex

    26. There is a large amount of potential information in orthography beginnings/endings That goes well beyond morphemes Most beginnings/endings were not morphemes For all analyses, better classification from endings than beginnings (more for children than for adult’s)

    27. Modelling Architecture

    28. 25016 English disyllabic words CELEX lexical database (Baayen et al., 1993); 83% trochaic, 17% iambic learning rate:0.005; alignment: left; 5 million presentations of words, selected according to their log-compressed frequency; 20 simulations; 90% training, 10% testing, randomly selected

    30. nouns vs. verbs ‘contrast as a noun versus con’trast as a verb overgeneralization errors ab- : a’bout, a’bove, a’broad (second syllable) CELEX: 60 ab- (51897) 2nd syllable stress, 21 ab- (7708) 1st syllable stress error: ‘abject. evenly distributed errors con- CELEX: 101 con- (13008) 1st syllable stress, 169 con- (44292) 2nd syllable stress errors: 38 1st syllable 44 2nd syllable stress

    31. Test on Rastle & Coltheart (2000) nonwords?

    32. R&C 2000 nonwords

    33. no-/-ate : nonword nockate (second syllable) CELEX: 104 no- (22077) 1st syllable stress, 15 no- (285) 2nd syllable stress 108 -ate (6565) 1st syllable stress, 165 -ate (3608) 2nd syllable stress

    34. Why does R&C model exhibit better performance than neural networks? Limited and non-representative training set for NN models

    35. Training on all polysyllabic words with the stress on 1st or 2nd syllable 51948 words, 89.6% of the polysyllabic word types in the CELEX database. 68.6% 1st syllable and 31.4% second syllable words (dysillabic words – 87% trochaic vs. 13% iambic words)

    37. Why does R&C model exhibit better performance than neural networks? Limited training set for NN models - Explicitly define beginnings and endings

    38. Kelly(2004) non-words: 96 non-words varying in onset complexity: ½ C onset - pamdeen ½ CC onset – plamdeen 78 trochaic vs.18 iambic words 20 subjects in silent reading task

    39. Kelly2004 nonwords

    40. Kelly2004 results

    41. R&C(2000) model: 1/3 of errors were from the noprefix/nosuffix class of words (bolay, wispay) co- (colvane, corlax) Conflicting cues (beginning vs. endings) plamdeen, gronvoon pl-, gr- (complex onset) – trochaic words -een, -oon (suffix) – iambic words

    42. Why does R&C model exhibit better performance than neural networks? Limited training set for NN models - Explicitly define beginnings and endings Phonology and/or parts-of-speech information

    44. Phonology and Parts-of-speech

    45. Multiple cue accounts have been shown to result in more accurate classification in: speech segmentation tasks (Onnis, Monaghan, Chater, & Richmond, 2005); grammatical categorisation tasks (Monaghan, Christiansen, & Chater, 2007).

    47. Orthography, Phonology, Parts-of-speech

    48. What is the role of orthography? Orthography and other cues? Rule-based vs. connectionist account? Sublexical nature of the stress assignment?

    49. Conclusions The present study provided a demonstration that stress assignment for words and nonwords can be accomplished with good accuracy in a connectionist model that learns to map orthography onto stress position for disyllabic words in English. Additional simulations indicated that combination of orthographical, phonological and distributional cues can give improved performance in the stress assignment task.

    50. Rule-based vs. connectionist accounts: Connectionist account allowed more detailed exploration of different cues relevant for the stress assignment; Stress assignment is clearly part of sublexical process. Conclusions

    51. Further simulations Further testing on novel sets of nonwords, including phonological and distributional information; Cross-linguistic comparison with Italian; Simulations of the developmental results.

    52. This work was supported by the ESRC/ARC Bilateral Research Awards Grant, RES 000-22-1975.

    53. Thank you!

More Related