300 likes | 419 Views
Toward Rich Phonology. Robert Port Linguistics, Cognitive Science Indiana University August, 2006 ESCA Experimental Linguistics, Athens. The standard view of language. 1.`Language is a cognitive symbol system …’ Symbols : discrete tokens static serially ordered
E N D
Toward Rich Phonology Robert Port Linguistics, Cognitive Science Indiana University August, 2006 ESCA Experimental Linguistics, Athens
The standard view of language 1.`Language is a cognitive symbol system …’ Symbols: discrete tokens static serially ordered perfectly recognized and produced with associated meanings The basic units are the phonological segment and the word. 2. `… used for real-time processing of language.’ Speech production = `encoding’ Speech perception = `decoding’ Memory = keeping symbols `active’ Linguistic processing = `reading’ and `writing’ symbols
This assumption underlies everything most linguists do: Chomsky-Halle, optimality theory, etc. • Underlies the International Phonetic Alphabet • Underlies most talk about language by psychologists. But human memory for words is far richer than this suggests.
Evidence for Rich Sensory Memory Look first at Vision: Visual memory is detailed and depends on massive memory for details. Posner-Keele (1968) random dot patterns for categorization Prototype A Prototype B Experiment: A random dot pattern serves as a prototype. It is not shown to subjects. Only noisy variants are shown. • Subjects are trained to classify dot patterns distorted from 2 prototypes. Measure accuracy and RT. • Results: After training, the prototype (though unseen) is recognized well. BUT performance is best on the actual training stimuli.
`Exemplar Memory’ for Visual Images or Other Items Modelers of categorization, recall and recognition (eg, Hintzman, Nosofsky, Shiffrin) get best results by: • storing all exemplars(ie, all presented tokens) • computing similarity of a new token to all items in memory • responding with, eg, the category of the closest matches. In vision we remember detail well on a single exposure! • If I showed 50 photographs, 1 second each, you would probably recognize a repeated picture - even later in the day. • You can remember details from this morning – what you ate, who you saw, what the cafeteria looked like, what your plate of food looked like, etc.
Humans can do ``one-trial learning’’ of many coincidental events. • No generalizations involved. • No `training’ required This skill is found in all mammals but is best in primates. Called `episodicmemory.’ (See review by Mark Gluck, Trends Cog Sci)
What mechanism could store ``random’’ co-occurrences? The hippocampus (and neighboring regions) are essential for: • autobiographical memory, • picture recognition, • maze learning, • associating a smell with an event, Linguistic events could be first stored as episodes and gradually be incorporated into long-term memory.
But linguists (including me) have assumed words do not have a rich episodic memory Words were assumed to be perceptually identified and stored using a code that is abstract. A representation like is: • speaker independent[You and I produce identical transcriptions] • segmented into ordered, context-free parts[Segments represent all that is necessary to specify words] • rate invariant[The same symbols apply at different speaking rates] • low dimensional Jakobson: 12 bits/segment. Chomsky-Halle: ~40 bits/segment. At 15-20 segments/sec, this implies < 1000 bits/sec. Presumably this code of symbols is used in long-term and short-term memory.
Linguistics has always considered phonemes to be `psychologically real’ • Saussure `The phoneme is an acoustico-motor image’ • Troubetzkoy and Jakobson took phoneme as the psychological counterpart of a letter. • IPA assumes letters capture what is important. • Chomsky- Halle: ``Utterances are sequences of discrete segments that are complexes of phonetic features.’’ Sound Pattern of English, p 5.
Nearly all linguists would agree with Morris Halle: ``It is unlikely that the information about the phonic shape of words is stored in the memory of speakers in acoustic form resembling a spectrogram’’ 1985, in Fromkin, ed. If true, then speech could not have `auditory episodic memory’.
Evidence for rich memory of words Palmeri, Goldinger, Pisoni (1993) • Ss heard a continuous list of words spoken by 2, 6, 12 or 20 voices. • Asked to recognize repetitions after lag of 2, 4, 8, 16, 32 or 64 words. Of course, performance declined with greater lag But no effect of number of talkers on recognition! It doesn’t matter how many voices because the information is always stored – automatically! Words are stored episodically, just like visual images and everyday events.
What does rich memory store about speech? We apparently store: • full auditory detail • speaker’s voice info • emotional features • word identity • speakerID • semantic features • othercontext features • frequency-of-occurrence • (orthographicspelling) We store as much detail as possible – at least for awhile. Formant tracks of short Australian vowels in CVCs, multiple tokens, male speaker: dead, Dad, Dodd, dud
Any further evidence? Yes, lots! Of course, an abstractlinguistic description (using features, segments, orthographic letters, etc) is stored as wellfor educated speakers.For these, symbolic description tends to dominate our conscious experience of language.
Other Evidence for Rich Language Memory 2. Dialect details, gradual dialect change and speaker idiosyncracies. (W. Labov, Betty Phillips) • How could we learn small differences in pronunciation if not recorded in memory? Vowels vary in systematic ways. NJ speaker. Labov 2005
3. Frequent speech patterns are different from infrequent patterns. (eg, Joan Bybee, B. Phillips) Eg, R. Port will not begin a word with a flapped T – normally. I want atomáto - no flap on initial t Where is mytobácco? But its OK with a high-frequency word (and phrase). I want to gotodáy - t usually flapped! I’ll see youtomórrow. Suggests each hearing of a word leaves a long-lasting record. Exemplar memory automatically records frequency.
4.Speech perception uses rich context-sensitive cues, not abstract invariant cues. Liberman (1968) was troubled that /di/ and /du/ have no acoustic invariant corresponding to /d/ - the ``cooarticulation problem’’ Rich memory sweeps it away. We remember big chunks. So every di is stored independently of all the du. (Note that speech recognition systems pay no attention to segments! They always use spectral trajectories.) di du Sound spectrogram of male speaker,
5-A. Letters don’t fit speech well, if you look close. mom seem 3 letters → 3 steady state gestures → 3 acoustic shapes But this model does not work for: • glides (w, r, l, y) • stops (d, t, b, p, g, k) which have closing, closure, release phases and aspiration intervals • diphthongs (may, my, mow, Boyd) • affricates (tsh, dzh) s I l m a m s i m
5-B. Letters don’t fit speech well, if you look close. The V-to-C continuum has arbitrary cuts a u a aU a a w a The VOT continuum – arbitrary cuts da da da ta tha tha voicedunaspiratedaspirated
5-C. Letters don’t fit speech well, if you look close. Overlap of gestures is common but ignored. m cap camp but also camp
5-D. Letters don’t fit speech well, if you look close. Timing patterns are critical to word specification. fuzzy ≠ fussy budding =? butting
5-E.Letters don’t fit speech well, if you look close. ‘‘Incomplete neutralization‘‘ Sounds not different enough to permit reliable ID may still be different. But they are not discrete. (Port & Crawford; Warner et al) • German Bunde-bunte but Bund-bunt (same?) • American English mad-mat madder-matter(same?) bud-butt budding-butting(same?) This situation should be impossible if words had letter-like spellings!
6. Non-alphabet-literates should find segments very non-intuitive. Look at ``phonological awareness tasks.’’ 6A.Studies of illiterates in Portugal (by Morais et al, Cognition 1979,1986) Tasks like: segment addition: add /p/ to syllable (urso → purso ) segment deletion: delete /p/ (purso → urso) Word-word condition and Nonword-nonword condition 30 Illiterate subjects and 30 Reading subjects 15 training trials with feedback 10 W-W test trials and 10 Nw-Nw test trials
Illiterates Readers n=30 n=30 Number of subjects Number of correct responses
6B.Matching experiments on Chinese who are literate in Chinese orthography with and without alphabet experience showed the same results (C. Reed et al, 1986).
6C. Reading skill correlates highly with `phonological awareness’ (Y. Liberman et al 1976)But which comes first? Ziegler and Goswami (2005, Psych Bull’tn) ``Phoneme awareness only develops once children are taught to read and write, irrespective of age.’’ (p. 14) Phonological awareness’ is mostly a result of literacy training. (I say `mostly’ because, eg, obviously the inventors of alphabetic writing must have had awareness of phonetic segments.)
Review: Evidence of Rich Memory for Words • Recognition memory experiments show phonetic detail is stored. • Gradual dialect variation (in space, time or social context) implies memory for rich phonetic detail – no discrete phonetic jumps. • Frequency influences both perception and production. • Speech perception uses auditory trajectories – not abstract invariants or static patterns. • Letters do not code most of what is important for perception – including timing, gestures and overlapping contrasts. • Only people with alphabettraining think speech is made from segments.
Conclusions so far • An alphabetic description of speech is possible only for those with alphabet-based education. • Linguistic fragments (words, phrases, etc) form clouds of trajectories in a high-D space with some category labels (and sometimes orthographic spellings). Each unit is a distribution. • Language may resemble a symbol system, but cannot be one. but: With such rich memory, why is phonology necessary?
Toward Rich Phonology • If memory is rich, why are words partially similar to each other? Pete, pate, pet, pat beat, bait, bet, bat seal, sale, sell, Sal zeal, -- , Szell, -- That is, if language skills do not rely on low-D descriptions, • Why is there phonological structure? • Why does a low-D description almost work?
Phonology is part of our ambient culture – our phonological culture. • The child is exposed to a corpus of utterances • Linguistics studies this corpus – ie, the data that are stimulation for language learning • Linguist observes patterns in this linguistic culture • Lexico-phrasal patterns • Phonological patterns
Conclusions • Language learning and language memory do not require low-D descriptions. High-D works fine. • But low-D phonological structure may lead to improvements in perception, learning, memory, etc. • Languages (as cultural products) evolve to maintain approximate low-D descriptions (ie, phonology). • Rich linguistics andphonology should study these patterns using all descriptions available: • alphabetical (eg, narrow or broad transcription) • acoustic (eg, spectra, formants, durations, etc) • and maybe neural (eg, fMRI) • The units of phonology are not units in speaker’s heads. Only masses of exemplars. Phonology can be found only in the corpus of utterances by speakers.