1.51k likes | 1.64k Views
Continuous acoustic detail affects spoken word recognition. Implications for cognition, development and language disorders. . Bob McMurray University of Iowa Dept. of Psychology. Collaborators. Richard Aslin Michael Tanenhaus David Gow J. Bruce Tomblin. Joe Toscano
E N D
Continuous acoustic detail affects spoken word recognition Implications for cognition, development and language disorders. Bob McMurray University of Iowa Dept. of Psychology
Collaborators Richard Aslin Michael Tanenhaus David Gow J. Bruce Tomblin Joe Toscano Cheyenne Munson Dana Subik Julie Markant
Why Speech and Word Recognition • Interface between perception and cognition. • Basic Categories - Meaning • Continuous Input -> Discrete representations. • Meaningful stimuli are almost always temporal. • Music - Visual Scenes (across saccades) • Language • We understand the: • Cognitive processes (word recognition) • Perceptual processes (speech perception) • Ecology of the input (phonetics) • 4) Speech is important: disordered language.
Divisions, Divisions… Perception (& Action) Cognition Speech Perception Word Recognition, Sentence Processing Psychology Phonology, The Lexicon Linguistics Phonetics Speech / Language Pathology Speech, Hearing Language
Divisions, Divisions… Divisions useful for framing research and focusing questions. But: Divisions between domains of study can become… Implicit models of cognitive processing.
Divisions in Spoken Language Understanding • Speech Perception • Categorization of acoustic input into sublexical units. Acoustic Sublexical Units /la/ /ip/ /a/ /b/ /l/ /p/ • Word Recognition • Identification of target word from active sublexical units. Lexicon
Divisions yield processes • Speech Perception • Pattern Recognition • Normalization Processes • Stream Segregation Acoustic Sublexical Units /la/ /ip/ /a/ /b/ /l/ /p/ • Word Recognition • Competition • Activation • Constraint Satisfaction Lexicon
Processes yield models • Speech Perception • Extract invariant phonemes and features. • Discard continuous variation. Acoustic Sublexical Units /la/ /ip/ /a/ /b/ /l/ /p/ • Word Recognition • Identify single • referent. • Ignore competitors. Lexicon Reduce Variance Reduce Continuous Variance
The Variance Reduction Model Words Remove variance Phonemes (etc) Remove variance Variance Reduction Model (VRM)Understanding speech is a process of progressively extracting invariant, discrete representations from variable, continuous input. Continuous speech cues play a minimal role in word recognition (and probably wouldn’t be helpful anyways).
Temporal Integration Variance Reduction Mechanisms The VRM might apply if speech were static. “Goon” Goal:Identify /u/ Signal: Low F1, F2, High F3 Noise: Initially: F2 decreasing Later: F2 increasing Presence of anti-formant
Temporal Integration But the dynamic properties make it more difficult. Gone. Maybe in STM? Hasn’t happened yet. “Goon” Goal:Identify /u/ Signal: Low F1, F2, High F3 Noise: Initially: F2 decreasing Later: F2 increasing Presence of anti-formant
Temporal Integration Variance Utilization Mechanisms Prior /g/ Upcoming /n/ But the dynamic properties make it more difficult. Gone. Maybe in STM? Hasn’t happened yet. “Goon” Goal:Identify /u/ Signal: Low F1, F2, High F3 Signal': Initially: F2 decreasing Later: F2 increasing Presence of anti-formant
Goals Words Remove variance Phonemes (etc) Remove variance • Replace the Variance Reduction Model with the Variance Utilization Model. 2) Normal lexical activation processes can serve as variance utilization mechanisms. 3) Speculatively (and not so speculatively) examine the consequences for: • Temporal Integration / Short Term Memory. • Development • Non-normal Development
Outline • Review • Origins of the VRM. • Spoken Word Recognition. • 2) Empirical Test 3) The VUM • Lexical Locus • Temporal Integration • SLI proposal 4) Developmental Consequences • Empirical Tests • Computational Model • CI proposal
Word Recognition X basic bakery bakery X ba… kery barrier X X bait barricade X baby • Online Spoken Word Recognition • Information arrives sequentially • Fundamental Problem: At early points in time, signal is temporarily ambiguous. • Later arriving information disambiguates the word.
Word Recognition • Current models of spoken word recognition • Immediacy:Hypotheses formed from the earliest moments of input. • Activation Based: Lexical candidates (words) receive activation to the degree they match the input. • Parallel Processing: Multiple items are active in parallel. • Competition: Items compete with each other for recognition.
Word Recognition Input: b... u… tt… e… r time beach butter bump putter dog
Word Recognition These processes have been well defined for a phonemic representation of the input. c A g n I S n • Considerably less ambiguity if we consider subphonemic information. • Bonus: processing dynamics may solve problems in speech perception. Example: subphonemic effects of motor processes.
Coarticulation n n ee t c k Any action reflects future actions as it unfolds. Example:Coarticulation Articulation (lips, tongue…) reflectscurrent, futureandpastevents. Subtle subphonemic variation in speech reflects temporal organization. Sensitivity to theseperceptualdetails might yield earlier disambiguation. Lexical activation could retain these perceptual details.
Review: These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded. Example:Categorical Perception
Categorical Perception B 100 100 Discrimination % /p/ Discrimination ID (%/pa/) 0 0 B VOT P • Sharp identification of tokens on a continuum. P • Discrimination poor within a phonetic category. Subphonemic variation in VOT is discarded in favor of adiscretesymbol (phoneme).
Categorical Perception Evidence against the strong form of Categorical Perception from psychophysical-type tasks: • Discrimination Tasks • Pisoni and Tash (1974) • Pisoni & Lazarus (1974) • Carney, Widin & Viemeister (1977) • Training • Samuel (1977) • Pisoni, Aslin, Perey & Hennessy (1982) • Goodness Ratings • Miller (1997) • Massaro & Cohen (1983)
Variance Reduction Model Words Remove variance Phonemes (etc) Remove variance CP enabled a fundamental independence of speech perception & spoken word recognition. Evidence against CP seen as supporting VRM (auditory vs. phonological processing mode). Critical Prediction: continuous variation in the signal should not affect word recognition.
Experiment 1 ? Does within-category acoustic detail systematically affect higher level language? Is there a gradient effect of subphonemic detail on lexical activation?
McMurray, Aslin & Tanenhaus (2002) A gradient relationshipwould yield systematic effects of subphonemic information on lexical activation. If this gradiency is useful for temporal integration, it must be preserved over time. Need a design sensitive to bothacoustic detailand detailedtemporal dynamicsof lexical activation.
Acoustic Detail Use a speech continuum—more steps yields a better picture acoustic mapping. KlattWorks:generate synthetic continua from natural speech. • 9-step VOT continua (0-40 ms) • 6 pairs of words. • beach/peach bale/pale bear/pear • bump/pump bomb/palm butter/putter • 6 fillers. • lamp leg lock ladder lip leaf • shark shell shoe ship sheep shirt
Temporal Dynamics How do we tap on-line recognition? With an on-line task:Eye-movements Subjects hear spoken language and manipulate objects in a visual world. Visual world includes set of objects with interesting linguistic properties. abeach,, a peachand some unrelated items. Eye-movements to each object are monitored throughout the task. Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
Temporal Dynamics Why use eye-movements and visual world paradigm? • Relatively naturaltask. • Eye-movements generated veryfast(within 200ms of first bit of information). • Eye movementstime-lockedto speech. • Subjectsaren’t awareof eye-movements. • Fixation probability maps ontolexical activation..
Task A moment to view the items
Task Bear Repeat 1080 times
Identification Results 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 High agreement across subjects and items for category boundary. proportion /p/ B VOT (ms) P By subject:17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms
Eye-Movement Analysis 200 ms Trials 1 2 3 4 5 % fixations Time Target = Bear Competitor = Pear Unrelated = Lamp, Ship
Eye-Movement Results 0.9 VOT=0 Response= VOT=40 Response= 0.8 0.7 0.6 0.5 Fixation proportion 0.4 0.3 0.2 0.1 0 0 400 800 1200 1600 2000 0 400 800 1200 1600 Time (ms) More looks to competitor than unrelated items.
Eye-Movement Results target Fixation proportion Fixation proportion time time • Given that • the subject heard bear • clicked on “bear”… How often was the subject looking at the “pear”? Categorical Results Gradient Effect target target competitor competitor competitor competitor
Eye-Movement Results 20 ms 25 ms 30 ms 10 ms 15 ms 35 ms 40 ms 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 400 800 1200 1600 0 400 800 1200 1600 2000 Response= Response= VOT VOT 0 ms 5 ms Competitor Fixations Time since word onset (ms) Long-lasting gradient effect: seen throughout the timecourse of processing.
Eye-Movement Results 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 5 10 15 20 25 30 35 40 Area under the curve: Clear effects of VOT B: p=.017* P: p<.001*** Linear Trend B: p=.023* P: p=.002*** Response= Response= Looks to Competitor Fixations Looks to Category Boundary VOT (ms)
Eye-Movement Results 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 5 10 15 20 25 30 35 40 Unambiguous Stimuli Only Clear effects of VOT B: p=.014* P: p=.001*** Linear Trend B: p=.009** P: p=.007** Response= Response= Looks to Competitor Fixations Looks to Category Boundary VOT (ms)
Summary Subphonemic acoustic differences in VOT have gradient effect on lexical activation. • Gradient effect of VOT on looks to the competitor. • Effect holds even for unambiguous stimuli. • Seems to be long-lasting. Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).
Extensions P L Bear B Sh Basic effect has been extended to other phonetic cues. - general property of word recognition… • Voicing (b/p)1 • Laterality (l/r), Manner (b/w), Place (d/g)1 • Vowels (i/I, /)2 • Natural Speech (VOT)3 X Metalinguistic Tasks3 1McMurray, Clayards, Tanenhaus & Aslin (2004) 2McMurray & Toscano (in prep) 3McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)
Lexical Sensitivity 0.1 Response=P Looks to B 0.08 0.06 Competitor Fixations Response=B Looks to B 0.04 Category Boundary 0.02 0 0 5 10 15 20 25 30 35 40 VOT (ms) Basic effect has been extended to other phonetic cues. - general property of word recognition… • Voicing (b/p)1 • Laterality (l/r), Manner (b/w), Place (d/g)1 • Vowels (i/I, /)2 • Natural Speech (VOT)3 X Metalinguistic Tasks3 1McMurray, Clayards, Tanenhaus & Aslin (2004) 2McMurray & Toscano (in prep) 3McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)
Lexical Sensitivity 0.1 Response=P Looks to B 0.08 0.06 Competitor Fixations Response=B Looks to B 0.04 Category Boundary 0.02 0 0 5 10 15 20 25 30 35 40 VOT (ms) Basic effect has been extended to other phonetic cues. - general property of word recognition… • Voicing (b/p) • Laterality (l/r), Manner (b/w), Place (d/g) • Vowels (i/I, /) • Natural Speech (VOT) X Metalinguistic Tasks 1McMurray, Clayards, Tanenhaus & Aslin (2004) 2McMurray & Toscano (in prep) 3McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)
The Variance Utilization Model Word recognition is systematically sensitiveto subphonemic acoustic detail. 2) Acoustic detail is represented as gradations in activation across the lexicon. • Normal word recognition processes do the work of. • Maintaining detail • Sharpening categories • Anticipating upcoming material • Resolving prior ambiguity.
The Variance Utilization Model b/p bump pump dump bun bumper bomb Input: b... u… m… p… time Gradations phonetic cues preserved as relative lexical activation.
The Variance Utilization Model b/d bump pump dump bun bumper bomb Input: b... u… m… p… time Gradations phonetic cues preserved as relative lexical activation.
The Variance Utilization Model bump pump dump bun bumper bomb Input: b... u… m… p… time Vowel length Non-phonemic distinctions preserved. (e.g. vowel length: Gow & Gordon, 1995; Salverda, Dahan & McQueen 2003)
The Variance Utilization Model bump pump dump bun bumper bomb Input: b... u… m… p… time n/m n/m info lost Material only retained until it is no longer needed. Words are a conveniently sized unit.
The Variance Utilization Model bump pump dump bun bumper bomb Input: b... u… m… p… time No need for explicit short-term memory: lexical activation persists over time.
The Variance Utilization Model bump pump dump bun bumper bomb Input: b... u… m… p… time Lexical competition: Perceptual warping (ala CP) results from natural competition processes.