1 / 51

Puzzles and Patterns in 50 years of Research on Speech Perception

Puzzles and Patterns in 50 years of Research on Speech Perception. Sarah Hawkins University of Cambridge sh110@cam.ac.uk. Three periods. 1950-1965 Broad-based exploration

reid
Download Presentation

Puzzles and Patterns in 50 years of Research on Speech Perception

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Puzzles and Patternsin 50 years of Research on Speech Perception Sarah Hawkins University of Cambridge sh110@cam.ac.uk

  2. Three periods • 1950-1965 Broad-based exploration • 1965-1990s Narrowed to focus on thesearch for invariance in the relationship between speech signal and its percept: THEORY • 1995…. This focus is broadening again • to include ‘discrepant’ data & new understanding • which requires changes in conceptualization of • task goals • processes involved

  3. The Main Message • Speech perception is at an exciting stage: we are beginning to integrate areas of old research with the mainstream theoretical work of the last 30 years or so • A paradigm shift?

  4. Early work Glorious Discovery

  5. Early work • often looked at effects on the whole signal • but as puzzles arose, and we looked more closely, then attention became focused on small domains in an effort both to simplify and to clarify

  6. Early work: source separation Cocktail party effect / multi-talker perception Cherry (1953) • continuous natural speech, with different types of content, presented in different ways • a huge wealth of observations relevant to • memory • attention • transitional probabilities • speaker vs message Cherry (1953) JASA 25, 975-979

  7. Early work: source separation Cocktail party effect / multi-talker perception Broadbent & Ladefoged (1957) • separate synthetic formants fuse to sound like a single vowel when presented to the same or different ears, only if they have the same f0 • compared ‘natural’ and ‘sustained’ formants • extensions to theories of hearing (e.g. Licklider) Broadbent & Ladefoged (1957) JASA 29, 708-710 Darwin (1981) QJEP 33, 185-207 Bregman (1990) Auditory Scene Analysis ASA special session, 2004 Cooke & Ellis (2001) Sp. Comm. 35, 141–177

  8. Early work: source integration Sumby & Pollack(1954) Especially in high levels of noise: • audiovisual presentation increases intelligibility (visual contribution is relative to the available auditory contribution) Sumby & Pollack (1954) JASA 26: 212-215 Massaro (1998) Perceiving Talking Faces WidespreadAV groups and applications

  9. Early work: source integration Sumby & Pollack(1954) Especially in high levels of noise: • audiovisual presentation increases intelligibility (visual contribution is relative to the available auditory contribution) • in auditory-only presentations, polysyllables are more intelligible than monosyllables (overall shape... neighborhoods…cohorts…) Sumby & Pollack (1954) JASA 26: 212-215 Massaro (1998) Perceiving Talking Faces WidespreadAV groups and applications Richard Warren, Paul Luce, Marslen-Wilson

  10. Early work: brain function Kimura (1961) • speech is processed more efficiently by the ear that is contralateral to the language-dominant hemisphere • independent of handedness and right/left focus of damage due to epilepsy  complexities of auditory pathways, cerebral dominance, and speech processing Kimura (1961) Canadian J. Psychol., 15, 166-171 The new ‘cognitive neuroscience/psychology’…

  11. Early work: memory Miller (1956) • short term memory span for unrelated items • The Magical Number Seven ± Two • can increase this span by: • making relative rather than absolute judgments • increasing the number of dimensions • chunking into larger items • recoding is a crucial process Miller (1956) Psychological Review63, 81-97 Serial learning and recall (e.g. Underwood) Lashley (1951) Serial order in behavior Pisoni (1973) and later

  12. Early work: intelligibilityContext of Possible Responses Miller, Heise & Lichten (1951) • monosyllables • size of test vocabulary affects identification • 2…256…all monsylls • though presumably there are limits: • two vs six • five vs nine ! Miller, Heise & Lichten, (1951) J.Exp.Psych. 41, 329-335

  13. Early work: intelligibilityPhonetic Context Pickett & Pollack (1963) • excerpts from connected speechmust be≥ 800 ms long to be fully intelligible • regardless ofrate: • faster rates need more syllables to be understood (slowing the speech down does not help)  crucial role of coarticulation & style (‘connected speech processes’) Pickett & Pollack (1963) Language & Speech 6, 165-171

  14. Early work: preceding context affects the interpretation of the current sound Ladefoged and Broadbent (1957) • "Please say what this word is: bit bet bat but F1 of CARRIER 200-380 Hz 380-660 Hz bet bit Ladefoged and Broadbent (1957) JASA 29, 98-104

  15. Early work: immediate context determines the interpretation of the current stimulus Synthesizing bursts and transitionless vowels Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606

  16. Early work: immediate context determines the interpretation of the current stimulus Identification of bursts and transitionless vowels: the CV is identified as the minimal acoustic unit Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606

  17. Early work: immediate context determines the interpretation of the current stimulus Identification of burstless stops with different vowels: transitions areall you need! Delattre, Liberman, & Cooper (1955) JASA 27, 769-773

  18. b d g Categorical Perceptionof obstruent consonants Equal acoustic changes  unequal auditory percepts place of articulation of stops: /b/ vs /d/ vs /g/ Liberman, Harris, Hoffman, and Griffith (1957)Journal of Experimental Psychology 54, 358-368

  19. Categorical Perceptionof obstruent consonants • together with a theoretical bias in favor of binary oppositions • encouraged a focused search for simple transformations from the encoded signal to an unambiguous, formal linguistic mental representation

  20. This narrower focus • required clear conceptualisation of • identity of the important unit(s) of perception • process of abstraction • On the whole, the units and levels of linguistic description were rather uncritically adopted

  21. …units of linguistic description were rather uncritically adopted “we….had undertaken to find the ‘invariants’of speech, a term which implies, at least in its simplest interpretation, a one-to-one correspondence between something half-hidden in the spectrogram and the successive phonemes of the message.” Cooper, Delattre, Liberman, Borst & Gerstman,Perception of synthetic speech soundsJASA (1952) 24, 604-5

  22. …though not without some misgivings “…one should not expect always to be able to find acoustic invariants for the individual phonemes…we are trying to [compile] the code book, one in which there is one column for acoustic entries and another column for message units, whether these be phonemes, syllables, words, or whatever.” Cooper, Delattre, Liberman, Borst & Gerstman,Perception of synthetic speech soundsJASA (1952) 24, 604-5

  23. Middle period The search for essence: ‘invariance’

  24. Middle period:the search for essence • Impose order on the chaos! • Focus: non-linearity between variation in acoustic signal and perceptual response Categorical Perception(of consonants) • Context becomes seen as variability, so we control for it ever more stringently

  25. to discover the crucial—invariant—properties requires a view of what is fundamental • The basic syllable! ba • CV • in isolation • stressed • possibly with only one V if we’re looking at Cs, and only one C if we’re looking at Vs

  26. Imposing order on chaos • The basic syllable: ba (context: silence) • What was lost? • polysyllables • unstressed syllables • prosody • accounting for rate changes • connected speech • informativeness of variation esp. in connected speech • meaning • communication • (most things really)

  27. Development of theory and the search for essence • Two main approaches The Motor Theory Quantal Theory leading to Acoustic/Auditory Invariance

  28. The Motor Theory of Speech Perception Liberman,Cooper, Shankweiler &Studdert-Kennedy (1967)Psychological Review 74, 431–461 Liberman & Mattingly (1985) Cognition 21, 1-36 • Listeners interpret speech sounds in terms of • motoric gestures they would make them with (1967) • intended gestures of the speaker (1985) • Gestural unit: ‘phonetic category’

  29. Quantal Theory of Speech Perception (and production) Stevens (1972, 1989) • Regions of stability inthe acoustic signal, or auditory response, provide a basis for forming categories of sounds • Unit: distinctive feature (Chomsky & Halle 1968) Stevens (1989) Journal of Phonetics 17, 3-45 Stevens (1972) In David & Denes Human Communication. 51-66

  30. change little change Quantal Theory becomes Acoustic/Auditory invariance theory +consonantal -consonantal Stevens & Blumstein (1978) ……. Stevens (2002) • For each DF there is a binary response to an invariant acoustic or auditory property • e.g. particular changes in spectral shape over short time periods at crucial parts of the signal • segment boundaries • vowel steady states Stevens (2002) JASA 111, 1872-1891 Stevens & Blumstein (1978) JASA64, 1358-1368

  31. Acoustic/Auditory invariance theory +strident -strident Stevens (2002) • landmarks: • islands of reliability • built-in local context • connected speech… Stevens (2002) JASA 111, 1872-1891

  32. Common properties • Motor and Acoustic Invariance theories have much in common • dynamic • early abstraction • discrete units • phonological

  33. Common properties • Motor and Invariance theories havemuch in common • dynamic • early abstraction • discrete units • phonological allowed psycholinguistic theories to assume an input that is abstract and discrete: to ignore phonetic information

  34. Psycholinguistic theories • Focus on word segmentation & identification • Top-down knowledge compensates for impoverished (phonemic) input • metrical stress, possible words, phonotactics…. • Statistical, probabilistic • Some names: • McClelland & Elman (TRACE) • Cutler, Norris, McQueen (Race, Shortlist, Merge) • Marslen-Wilson, Gaskell… (Cohort)

  35. extensions, questions:is simplicity the best answer? Kewley-Port (1983) • better identification with overall pattern (more detail?) Klatt (1979) • LexicalAccess From Spectra (LAFS) • whole-word patterns? Kewley-Port (1983) JASA 73, 322-335 Klatt (1979) Journal of Phonetics 7, 279-312

  36. nonword-word: dask-task word-nonword: dash-tash 100 % /d/ 0 short VOT (d) long VOT (t) extensions, questions:wider influences Ganong (1980) • identification expt • VOT continuum • word at one end, non-word at the other • perception is more forgiving when thesound means something! • Ganong (1980)J. Exp. Psych: HPP 6, 110-125

  37. Summary:‘context’ and ‘signal’ • ‘Units’ functionally inseparable from ‘context’ • The context and the signal together determine whether the signal is coherent • and hence what each unit ‘is’

  38. Recent developments (since early-to-mid 90s) systematic subtle variation as linguistically informative:classify the contexts in a more linguistically-sophisticated way

  39. Combining old and new themes • re-examination and extension of information provided by systematic phonetic variation • new areas, e.g. • cross-linguistic work (Best, Beddor, Bradlow...) • memory & learning (Goldinger, Pisoni...) • functional brain imaging (Sophie Scott)

  40. Listeners use fine phonetic detail Allen & Miller (2004) • speaker identity: listeners generalize talker-specific VOT information to a novel word Smith (2004) • lexical identity: slightly inappropriate allophones in a sentence disrupt word-spotting only when speaker is familiar to listener • familiarization to speakers is fast Allen & Miller (2004) JASA 116, 3171-3183 Smith (2004) PhD Dissertation, Cambridge University

  41. Small, statistically non-significant changes in each of several formants can add up to large perceptual difference; conversely, some statistically significant differences may have no perceptual effect. • Spoken word recognition test, which is used to establish cerebral dominance • large groups of native speakersof Chinese/English/Spanish • coronal MRI slices, data for 3 Ss, >200 ms post-stimulus onset • Lateralisation (%Ss): Spanish100%left English80%left Chinese79%bilateral (tone lang.) Chinese English Spanish Valaki et al. (2004) Neuropsychologia 42, 967–979

  42. What sort of model? • biologically plausible • roles of attention, memory & learning • focus on meaning (‘sound to sense’) • multiple potential ‘units of perception’no obligatory units? • structure from incomplete information Adaptive Resonance Theory (ART) ? Grossberg 1986… Grossberg (2003) Journal of Phonetics 31, 423-445

  43. A key issue • what is a phonetic category? (Carol Fowler, May 2004: ‘never been sure’) • mental representations of phonetic categories are dynamic, relational, & plastic • Repp, Lindblom, Studdert-Kennedy • Bradlow, Pisoni, Hawkins….. Hawkins (2003) Journal of Phonetics 31, 373-405

  44. bottom-up vs top-down? • phonetic variation that systematically indicates linguistic structure makes many ‘top-down’ processes unnecessary • e.g. allophonic detail vs Possible Word Constraint • and blurs the traditional distinction between signal & knowledge

  45. A Challenge • to define and refine new questions in testable ways – i.e. to refocus, but to do it in ways that: • are rigorous yet focus on meaning and communication • avoid the ‘new understanding’ becoming doctrinaire • build on past contributions

  46. Some topics I haven’t mentioned but should have… • infants’ & animals’ perception (periods 2 & 3) • vowel perception (dynamics; center of gravity) • sine wave speech • more theories (direct perception, auditory enhancement, FLMP) • more on memory (incl. associations) & learning • connections with psychoacoustics • production-perception connections and could have, if I’d told the same story in a different way

  47. Run an identification experiment 1 versus 3 Discrimination peak % /b/ Categorical Perception Run a discrimination experiment 100 % difft 0 1 ... 3 … 5 … 7 Courtesy Chris Darwin’s web site

  48. Valaki et al. (2004) Neuropsychologia 42, 967–979 • Monolingual/near monolingual native speakers: • 30 Mandarin-Chinese • 20 Spanish speakers all right handed • 42 American English • Whole-head MEG, auditory word recognition test, used clinically to establish hemispheric dominance for receptive language: 63 abstract words/language • 33 target words, each in 3 lists, with 10 novel non-target words in each list • lift finger when you recognize a target word

  49. Patterns of dominance (%) Laterality Index: (LH – RH) / (LH + RH)

More Related