920 likes | 1.05k Views
Two /b/ or not “too bee”: Gradient sensitivity to subphonemic variation, categorical perception and the effect of task. Bob McMurray. With thanks to. Outline. Invariance, Covariance and Gradient Sensitivity in speech perception. Categorical Perception and other previous research.
E N D
Two /b/ or not “too bee”: Gradient sensitivity to subphonemic variation, categorical perception and the effect of task. Bob McMurray With thanks to
Outline • Invariance, Covariance and Gradient Sensitivity in speech perception. • Categorical Perception and other previous research. • Experiment 1: Gradient sensitivity in Word Recognition • Experiment 2-5: The effect of experimental task • Targets & competitors, gradient sensitivity and temporal dynamics, • Conclusions
Problem of Invariance Phonetic features are correlated with many acoustic realizations. Acoustic realization of a phonetic feature depends on context. How do we extract invariant linguistic representations from a variable acoustic signal? What properties of the signal provide an invariant mapping to linguistic representations? How do we extract discrete units from a graded signal?
Problem of Invariance • Two Solutions • Motor Theory: acoustic invariance does not exist, but specialized mechanisms allow us to unpack speech into invariant motor representations (Liberman & Mattingly, 1985; Fowler, 1986). • Acoustic Invariance: better computational methods and neurologically inspired models may find invariant acoustic properties of the signal (Blumstein,1998; Sussman et al, 1998)
Rethinking Invariance The Fundamental Approach How do we pay attention to the right parts of the signal and ignore the variation? However, recent work suggests that this the variation is actually highly informative covariation.
Rethinking Invariance • In measurements of productions, effects of • speaking rate on VOT (e.g. Kessinger & Blumstein) • prosodic domain and VOT and articulatory strength (Fougeron and Keating) • Place of articulation and vowel quality 5 syllables away (Local) • Between-consonant coarticulation (Mann & Repp) • suggest that a system sensitive to fine grained detail could take advantage of all of this information.
Rethinking Invariance Speech perception shows probabilistic effects of many information sources: Lexical Context Spectral vs. Temporal Cues Visual Information Transition Statistics Speech Rate Stimulus Naturalness Sentential Context Compensatory Coarticulation Embeddings Syllabic Stress Lexical Stress Phrasal Stress A system that was sensitive to fine-grained acoustic detailmight be much more efficient than one that was not. Tracking covariance may help solve the problem of invariance.
What sort of sensitivity is needed? Gradient Sensitivity: As fundamentally graded acoustic information changes (even changes that still result in the same “category”), activation of lexical or sublexical representation changes monotonically. Activation of linguistic units reflects the probability that a that unit is instantiated by the acoustic signal.
B Discrimination ID (%/pa/) P • Sharp identification of speech sounds on a continuum • Discrimination poor within a phonetic category Categorical Perception CP suggests listeners do not show gradient sensitivity to subphonemic information. 100 % /p/ 0 B VOT P
Evidence for Categorical Perception • Supported by: • Work on VOT and place of articulation. • Ubiquity of steep identification functions. • Recent electrophysiological data (e.g. Philips, Pellathy, Marantz, Yellin, Wexler, Poeppel, McGinnis & Roberts, 2000; Sharma & Dorman, 1999)
Revisiting Categorical Perception? Evidence against CP comes from Discrimination Tasks Pisoni and Tash (1974) Pisoni & Lazarus (1974) Carney, Widin & Viemeister (1977) Training Samuel (1977) Pisoni, Aslin, Perey & Hennessy (1982) Goodness Ratings Miller (1997) Massaro & Cohen, 1983 Only goodness ratings show any hint of gradiency. No gradient effects from identification tasks. But, 2AFC metalinguistic tasks may underestimate sensitivity to subphonemic acoustic information
Lexical sensitivity • Andruski, Blumstein & Burton (1994) • Created stimuli that were either voiceless, 1/3 or 2/3 voiced. • 2/3 voiced stimuli primed semantic associates more weakly than fully voiceless or 1/3 voiced tokens • First demonstration of lexical sensitivity to natural variation in consonants. • However: • 2/3 voiced stimuli were close to category boundary. • No evidence for gradiency—difference between 2 items. • Hard to interpret temporal dynamics in priming tasks.
Remaining Questions • Is sensitivity to subphonemic differences gradient? • Is it symmetrical (I.e. gradiency on both sides of category boundary)? • Are differences preserved long enough to be usefully combined with subsequent input? • Perhaps a more sensitive measure….
Eye-Tracking IR Headtracker Emitters Head-Tracker Cam Monitor Head 2 Eye cameras Computers connected via Ethernet Subject Computer Eyetracker Computer 250 Hz realtime stream of eye positions. Parsed into Saccades, Fixations, Blinks, etc… Head movement compensation. Output in ~screen coordinates.
Eye-Tracking • Fixations to object in response to spoken instructions: • are time locked to incoming information (Tanenhaus, Spivey-Knowlton, Ebehart and Sedivy, 1995) • can be easily mapped onto lexical activation from models like TRACE (Allopenna, Magnuson and Tanenhaus, 1998) • show effects of non-displayed competitors (Dahan, Magnuson, Tanenhaus & Hogen) • provide a glimpse at how activation for competitors unfolds in parallel over time.
Experiment 1 Lexical Identification “too bee” Can we use eye-tracking methodologies to find evidence for graded perception of VOT?
Experiment 1: Lexical Identification Six 9-step /ba/ - /pa/ VOT continuum (0-40ms) Bear/Pear Beach/Peach Butter/Putter Bale/Pale Bump/Pump Bomb/Palm 12 L- and Sh- Filler items Leaf Lamp Ladder Lock Lip Leg Shark Ship Shirt Shoe Shell Sheep Identification indicated by mouse click on picture Eye movements monitored at 250 hz 17 Subjects
Experiment 1: Lexical Identification A moment to view the items
Experiment 1: Lexical Identification 500 ms later
Experiment 1: Identification Results 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 proportion /p/ B VOT (ms) P High agreement across subjects and items for category boundary By subject:17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms
Analysis of fixations 1 0.9 0.8 0.7 0.6 Yields a “perfect” categorization function. 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 ID Function after filtering Actual Exp2 Data Trials with low-frequency response excluded. proportion /p/ B VOT (ms) P
Analysis of fixations Trials 1 2 3 4 5 + 200 ms Target =bug Competitor =bus Unrelated =cat, fish Time
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 400 800 1200 1600 2000 0 400 800 1200 1600 Experiment 1: Eye Movement Results VOT=0 Response= VOT=40 Response= Fixation proportion Time (ms) More looks to competitor than unrelated items
Analysis of fixations Gradient “competitor” effects • e.g. Given that • the subject heard bomb • clicked on “bomb”… How often was the Subject looking at the “palm”? Categorical Results Gradient Effect target target Fixation proportion Fixation proportion competitor competitor time time
20 ms 25 ms 30 ms 10 ms 15 ms 35 ms 40 ms 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 400 800 1200 1600 0 400 800 1200 1600 2000 Experiment 1: Eye Movement Results Gradient competitor effects of VOT? Response= Response= VOT VOT 0 ms 5 ms Fixation proportion Time since word onset (ms) Smaller effect on the amplitude of activation—more effect on the duration: Competitors stay active longer as VOT approaches the category boundary.
Experiment 1: Gradiency? 0.08 0.07 0.06 0.05 Andruski et al (schematic) Gradient Sensitivity 0.04 “Categorical” Perception 0.03 0.02 0 5 10 15 20 25 30 35 40 Looks to Looks to Fixation proportion VOT (ms)
0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 5 10 15 20 25 30 35 40 Clear effects of VOT B: p=.017* P: p<.0001*** Linear Trend B: p=.023* P: p=.002** Experiment 1: Eye Movement Results Response= Response= Looks to Fixation proportion Looks to Category Boundary VOT (ms)
0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 5 10 15 20 25 30 35 40 Clear effects of VOT B: p=.017* P: p<.0001*** Linear Trend B: p=.023* P: p=.002** Experiment 1: Eye Movement Results Response= Response= Looks to Fixation proportion Looks to Category Boundary VOT (ms) Unambiguous Stimuli Only
Experiment 1: Results and Conclusions Subphonemic acoustic differences in VOT affect lexical activation. • Gradient effect of VOT on looks to the competitor. • Effect holds even for unambiguous stimuli. • Effect seems to be long-lasting (we’ll get back to that). Conservative Test • Filter out “incorrect” responses. • Use unambiguous stimuli only.
However… Why was it so hard to find evidence for gradiency in CP tasks? * Steep identification function consistently replicated. • What aspects of the task affect our ability to see gradient sensitivity? • Phoneme ID vs. Lexical ID? • Number of Alternatives? • Type of Stimuli? • Sensitivity of response measure
Experiment 2 Categorical Perception 2 /b/, not “too bee” What can the eye-tracking paradigm reveal about ordinary phoneme identification experiments?
Experiment 2: Categorical Perception Replicates “classic” task: 9-step /ba/ - /pa/ VOT continuum (0-40ms) 2AFC Identification indicated by mouse click. Eye movements monitored at 250 hz. 17 Subjects
Experiment 2: Categorical Perception 1 2 B P 3 Ba
Experiment 2: Identification Results Phoneme ID function steeper 1 0.9 0.8 Exp 2: BP 0.7 Exp 1: Words 0.6 Category boundaries are the same. 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 BP: 17.5 +/- .83ms Wordssubject:17.25 +/-1.33ms Wordsitem: 17.24 +/- 1.24ms Boundaries proportion /p/ B VOT (ms) P
Experiment 2: Data Analysis Effective ID Function Actual ID Function 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 Proportion of /p/ response VOT (ms) Trials with low-frequency response excluded. Effectively yields a “perfect” categorization function.
Experiment 2: Eye movement data 0.3 0.25 25 ms 0 ms 15 ms 5 ms 0.2 10 ms 20 ms 30 ms 35 ms 0.15 40 ms 0.1 0.05 0 400 800 1200 1600 2000 0 0 400 800 1200 1600 Response = B Response = P VOT VOT Fixation proportion Looks to B Looks to P Time (ms) • Some hints of gradiency for /p/. Even less for /b/. • Difference between stimuli near boundary and endpoints. • Perhaps more for /p/.
Experiment 2: Eye movement data Response=B Looks to P Response=B Looks to P 0.08 0.07 Fixation proportion 0.06 0.05 Category Boundary 0.04 0.03 0.02 VOT (ms) 0.01 0 0 5 10 15 20 25 30 35 40 /b/: p =.044* ptrend=.055 /p/: p<.001*** ptrend=.005*** Could be driven by differences near category boundary.
Experiment 2: Eye movement data 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 35 40 Response=B Looks to P Response=B Looks to P Fixation proportion Category Boundary VOT (ms) Unambiguous Stimuli Only /b/: p =.884 ptrend=.678 /p/: p =.013* ptrend=.003***
Experiment 2: Results and Conclusions • Very steep slope for mouse response curves. • consistent with traditional results • Identical category boundary to experiment 1 • validates stimuli • Small difference between stimuli near category boundary and others. • similar to Pisoni & Tash, Andruski, et al. • Gradient effect weak for /ba/, moderate for /pa/
Experiment 3 Number of Response Alternatives Not 2 but /b/? compare to experiment 2 (BaPa)
Experiment 3: BaPaLaSha Given the strong evidence for gradiency in Experiment 1 and the weaker evidence in Experiment 2, what is the effect of number of response alternatives? • Same 9-step /ba/ - /pa/ VOT continuum (0-40ms) as experiment 2. • La and Sha filler items added. • 4AFC Identification indicated by mouse click. Button locations randomized between subjects. • Eye movements monitored at 250 hz. • 17 Subjects
Experiment 3: BaPaLaSha P B Sh L La
Experiment 3: Identification Results 1 0.9 0.8 0.7 0.6 Exp. 1 (words) 0.5 Exp. 2 (BaPa) 0.4 Exp. 3 (BaPaLaSha) 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 proportion /p/ response VOT Number of response alternatives accounts for some of the difference in slope.
Experiment 3: Data Analysis Effective ID Function Actual ID Function 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 Proportion of /p/ response VOT (ms) Trials with low-frequency response excluded. Effectively yields a “perfect” categorization function.
Experiment 3: Eye movement data 1 0.9 0.8 0.7 0.6 B 0.5 P 0.4 0.3 UR 0.2 0.1 0 0 400 800 1200 1600 0 400 800 1200 1600 2000 VOT=40 Response=P VOT=0 Response=b Fixation proportion Time (ms) • More looks to competitor than unrelated stimuli (p<.001). • Eye movements in “phoneme ID” tasks are sensitive • to acoustic similarity.
Experiment 3: Eye movement data VOT 0 ms 15 ms 25 ms 5 ms 0.18 20 ms 10 ms 30 ms 0.16 35 ms 0.14 40 ms 0.12 0.1 0 400 800 1200 1600 2000 0.08 0.06 0.04 0.02 0 0 400 800 1200 1600 Response = B Response = P VOT Fixation proportion Looks to P Looks to B Time (ms) Difference between stimuli near boundary and endpoints
Experiment 3: Eye movement data 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 35 40 Fixation proportion Response=B Looks to P Response=B Looks to P Category Boundary VOT (ms) Close but no star: Nothing reaches significance /b/: p=.055 ptrend=.068 /p/: p=.510 ptrend=.199
Experiment 3: Eye movement data 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 35 40 Fixation proportion Response=B Looks to P Response=B Looks to P Category Boundary VOT (ms) Unambiguous Stimuli Only: even worse /b/: p=.374 ptrend=.419 /p/: p=.356 ptrend=.151
Experiment 3: Results Eye movements in phoneme ID tasks are sensitive to acoustic similarity between target and competitor. Number of alternatives explains some of differences in ID function. VERY weak subphonemic effects on lexical activation.