510 likes | 654 Views
Puzzles and Patterns in 50 years of Research on Speech Perception. Sarah Hawkins University of Cambridge sh110@cam.ac.uk. Three periods. 1950-1965 Broad-based exploration
E N D
Puzzles and Patternsin 50 years of Research on Speech Perception Sarah Hawkins University of Cambridge sh110@cam.ac.uk
Three periods • 1950-1965 Broad-based exploration • 1965-1990s Narrowed to focus on thesearch for invariance in the relationship between speech signal and its percept: THEORY • 1995…. This focus is broadening again • to include ‘discrepant’ data & new understanding • which requires changes in conceptualization of • task goals • processes involved
The Main Message • Speech perception is at an exciting stage: we are beginning to integrate areas of old research with the mainstream theoretical work of the last 30 years or so • A paradigm shift?
Early work Glorious Discovery
Early work • often looked at effects on the whole signal • but as puzzles arose, and we looked more closely, then attention became focused on small domains in an effort both to simplify and to clarify
Early work: source separation Cocktail party effect / multi-talker perception Cherry (1953) • continuous natural speech, with different types of content, presented in different ways • a huge wealth of observations relevant to • memory • attention • transitional probabilities • speaker vs message Cherry (1953) JASA 25, 975-979
Early work: source separation Cocktail party effect / multi-talker perception Broadbent & Ladefoged (1957) • separate synthetic formants fuse to sound like a single vowel when presented to the same or different ears, only if they have the same f0 • compared ‘natural’ and ‘sustained’ formants • extensions to theories of hearing (e.g. Licklider) Broadbent & Ladefoged (1957) JASA 29, 708-710 Darwin (1981) QJEP 33, 185-207 Bregman (1990) Auditory Scene Analysis ASA special session, 2004 Cooke & Ellis (2001) Sp. Comm. 35, 141–177
Early work: source integration Sumby & Pollack(1954) Especially in high levels of noise: • audiovisual presentation increases intelligibility (visual contribution is relative to the available auditory contribution) Sumby & Pollack (1954) JASA 26: 212-215 Massaro (1998) Perceiving Talking Faces WidespreadAV groups and applications
Early work: source integration Sumby & Pollack(1954) Especially in high levels of noise: • audiovisual presentation increases intelligibility (visual contribution is relative to the available auditory contribution) • in auditory-only presentations, polysyllables are more intelligible than monosyllables (overall shape... neighborhoods…cohorts…) Sumby & Pollack (1954) JASA 26: 212-215 Massaro (1998) Perceiving Talking Faces WidespreadAV groups and applications Richard Warren, Paul Luce, Marslen-Wilson
Early work: brain function Kimura (1961) • speech is processed more efficiently by the ear that is contralateral to the language-dominant hemisphere • independent of handedness and right/left focus of damage due to epilepsy complexities of auditory pathways, cerebral dominance, and speech processing Kimura (1961) Canadian J. Psychol., 15, 166-171 The new ‘cognitive neuroscience/psychology’…
Early work: memory Miller (1956) • short term memory span for unrelated items • The Magical Number Seven ± Two • can increase this span by: • making relative rather than absolute judgments • increasing the number of dimensions • chunking into larger items • recoding is a crucial process Miller (1956) Psychological Review63, 81-97 Serial learning and recall (e.g. Underwood) Lashley (1951) Serial order in behavior Pisoni (1973) and later
Early work: intelligibilityContext of Possible Responses Miller, Heise & Lichten (1951) • monosyllables • size of test vocabulary affects identification • 2…256…all monsylls • though presumably there are limits: • two vs six • five vs nine ! Miller, Heise & Lichten, (1951) J.Exp.Psych. 41, 329-335
Early work: intelligibilityPhonetic Context Pickett & Pollack (1963) • excerpts from connected speechmust be≥ 800 ms long to be fully intelligible • regardless ofrate: • faster rates need more syllables to be understood (slowing the speech down does not help) crucial role of coarticulation & style (‘connected speech processes’) Pickett & Pollack (1963) Language & Speech 6, 165-171
Early work: preceding context affects the interpretation of the current sound Ladefoged and Broadbent (1957) • "Please say what this word is: bit bet bat but F1 of CARRIER 200-380 Hz 380-660 Hz bet bit Ladefoged and Broadbent (1957) JASA 29, 98-104
Early work: immediate context determines the interpretation of the current stimulus Synthesizing bursts and transitionless vowels Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606
Early work: immediate context determines the interpretation of the current stimulus Identification of bursts and transitionless vowels: the CV is identified as the minimal acoustic unit Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606
Early work: immediate context determines the interpretation of the current stimulus Identification of burstless stops with different vowels: transitions areall you need! Delattre, Liberman, & Cooper (1955) JASA 27, 769-773
b d g Categorical Perceptionof obstruent consonants Equal acoustic changes unequal auditory percepts place of articulation of stops: /b/ vs /d/ vs /g/ Liberman, Harris, Hoffman, and Griffith (1957)Journal of Experimental Psychology 54, 358-368
Categorical Perceptionof obstruent consonants • together with a theoretical bias in favor of binary oppositions • encouraged a focused search for simple transformations from the encoded signal to an unambiguous, formal linguistic mental representation
This narrower focus • required clear conceptualisation of • identity of the important unit(s) of perception • process of abstraction • On the whole, the units and levels of linguistic description were rather uncritically adopted
…units of linguistic description were rather uncritically adopted “we….had undertaken to find the ‘invariants’of speech, a term which implies, at least in its simplest interpretation, a one-to-one correspondence between something half-hidden in the spectrogram and the successive phonemes of the message.” Cooper, Delattre, Liberman, Borst & Gerstman,Perception of synthetic speech soundsJASA (1952) 24, 604-5
…though not without some misgivings “…one should not expect always to be able to find acoustic invariants for the individual phonemes…we are trying to [compile] the code book, one in which there is one column for acoustic entries and another column for message units, whether these be phonemes, syllables, words, or whatever.” Cooper, Delattre, Liberman, Borst & Gerstman,Perception of synthetic speech soundsJASA (1952) 24, 604-5
Middle period The search for essence: ‘invariance’
Middle period:the search for essence • Impose order on the chaos! • Focus: non-linearity between variation in acoustic signal and perceptual response Categorical Perception(of consonants) • Context becomes seen as variability, so we control for it ever more stringently
to discover the crucial—invariant—properties requires a view of what is fundamental • The basic syllable! ba • CV • in isolation • stressed • possibly with only one V if we’re looking at Cs, and only one C if we’re looking at Vs
Imposing order on chaos • The basic syllable: ba (context: silence) • What was lost? • polysyllables • unstressed syllables • prosody • accounting for rate changes • connected speech • informativeness of variation esp. in connected speech • meaning • communication • (most things really)
Development of theory and the search for essence • Two main approaches The Motor Theory Quantal Theory leading to Acoustic/Auditory Invariance
The Motor Theory of Speech Perception Liberman,Cooper, Shankweiler &Studdert-Kennedy (1967)Psychological Review 74, 431–461 Liberman & Mattingly (1985) Cognition 21, 1-36 • Listeners interpret speech sounds in terms of • motoric gestures they would make them with (1967) • intended gestures of the speaker (1985) • Gestural unit: ‘phonetic category’
Quantal Theory of Speech Perception (and production) Stevens (1972, 1989) • Regions of stability inthe acoustic signal, or auditory response, provide a basis for forming categories of sounds • Unit: distinctive feature (Chomsky & Halle 1968) Stevens (1989) Journal of Phonetics 17, 3-45 Stevens (1972) In David & Denes Human Communication. 51-66
change little change Quantal Theory becomes Acoustic/Auditory invariance theory +consonantal -consonantal Stevens & Blumstein (1978) ……. Stevens (2002) • For each DF there is a binary response to an invariant acoustic or auditory property • e.g. particular changes in spectral shape over short time periods at crucial parts of the signal • segment boundaries • vowel steady states Stevens (2002) JASA 111, 1872-1891 Stevens & Blumstein (1978) JASA64, 1358-1368
Acoustic/Auditory invariance theory +strident -strident Stevens (2002) • landmarks: • islands of reliability • built-in local context • connected speech… Stevens (2002) JASA 111, 1872-1891
Common properties • Motor and Acoustic Invariance theories have much in common • dynamic • early abstraction • discrete units • phonological
Common properties • Motor and Invariance theories havemuch in common • dynamic • early abstraction • discrete units • phonological allowed psycholinguistic theories to assume an input that is abstract and discrete: to ignore phonetic information
Psycholinguistic theories • Focus on word segmentation & identification • Top-down knowledge compensates for impoverished (phonemic) input • metrical stress, possible words, phonotactics…. • Statistical, probabilistic • Some names: • McClelland & Elman (TRACE) • Cutler, Norris, McQueen (Race, Shortlist, Merge) • Marslen-Wilson, Gaskell… (Cohort)
extensions, questions:is simplicity the best answer? Kewley-Port (1983) • better identification with overall pattern (more detail?) Klatt (1979) • LexicalAccess From Spectra (LAFS) • whole-word patterns? Kewley-Port (1983) JASA 73, 322-335 Klatt (1979) Journal of Phonetics 7, 279-312
nonword-word: dask-task word-nonword: dash-tash 100 % /d/ 0 short VOT (d) long VOT (t) extensions, questions:wider influences Ganong (1980) • identification expt • VOT continuum • word at one end, non-word at the other • perception is more forgiving when thesound means something! • Ganong (1980)J. Exp. Psych: HPP 6, 110-125
Summary:‘context’ and ‘signal’ • ‘Units’ functionally inseparable from ‘context’ • The context and the signal together determine whether the signal is coherent • and hence what each unit ‘is’
Recent developments (since early-to-mid 90s) systematic subtle variation as linguistically informative:classify the contexts in a more linguistically-sophisticated way
Combining old and new themes • re-examination and extension of information provided by systematic phonetic variation • new areas, e.g. • cross-linguistic work (Best, Beddor, Bradlow...) • memory & learning (Goldinger, Pisoni...) • functional brain imaging (Sophie Scott)
Listeners use fine phonetic detail Allen & Miller (2004) • speaker identity: listeners generalize talker-specific VOT information to a novel word Smith (2004) • lexical identity: slightly inappropriate allophones in a sentence disrupt word-spotting only when speaker is familiar to listener • familiarization to speakers is fast Allen & Miller (2004) JASA 116, 3171-3183 Smith (2004) PhD Dissertation, Cambridge University
Small, statistically non-significant changes in each of several formants can add up to large perceptual difference; conversely, some statistically significant differences may have no perceptual effect. • Spoken word recognition test, which is used to establish cerebral dominance • large groups of native speakersof Chinese/English/Spanish • coronal MRI slices, data for 3 Ss, >200 ms post-stimulus onset • Lateralisation (%Ss): Spanish100%left English80%left Chinese79%bilateral (tone lang.) Chinese English Spanish Valaki et al. (2004) Neuropsychologia 42, 967–979
What sort of model? • biologically plausible • roles of attention, memory & learning • focus on meaning (‘sound to sense’) • multiple potential ‘units of perception’no obligatory units? • structure from incomplete information Adaptive Resonance Theory (ART) ? Grossberg 1986… Grossberg (2003) Journal of Phonetics 31, 423-445
A key issue • what is a phonetic category? (Carol Fowler, May 2004: ‘never been sure’) • mental representations of phonetic categories are dynamic, relational, & plastic • Repp, Lindblom, Studdert-Kennedy • Bradlow, Pisoni, Hawkins….. Hawkins (2003) Journal of Phonetics 31, 373-405
bottom-up vs top-down? • phonetic variation that systematically indicates linguistic structure makes many ‘top-down’ processes unnecessary • e.g. allophonic detail vs Possible Word Constraint • and blurs the traditional distinction between signal & knowledge
A Challenge • to define and refine new questions in testable ways – i.e. to refocus, but to do it in ways that: • are rigorous yet focus on meaning and communication • avoid the ‘new understanding’ becoming doctrinaire • build on past contributions
Some topics I haven’t mentioned but should have… • infants’ & animals’ perception (periods 2 & 3) • vowel perception (dynamics; center of gravity) • sine wave speech • more theories (direct perception, auditory enhancement, FLMP) • more on memory (incl. associations) & learning • connections with psychoacoustics • production-perception connections and could have, if I’d told the same story in a different way
Run an identification experiment 1 versus 3 Discrimination peak % /b/ Categorical Perception Run a discrimination experiment 100 % difft 0 1 ... 3 … 5 … 7 Courtesy Chris Darwin’s web site
Valaki et al. (2004) Neuropsychologia 42, 967–979 • Monolingual/near monolingual native speakers: • 30 Mandarin-Chinese • 20 Spanish speakers all right handed • 42 American English • Whole-head MEG, auditory word recognition test, used clinically to establish hemispheric dominance for receptive language: 63 abstract words/language • 33 target words, each in 3 lists, with 10 novel non-target words in each list • lift finger when you recognize a target word
Patterns of dominance (%) Laterality Index: (LH – RH) / (LH + RH)