Title Slide

Title Slide Stop-Consonant Perception in 7.5-month-olds: Evidence for gradient categories Bob McMurray & Richard N. Aslin Department of Brain and Cognitive Sciences University of Rochester With thanks to Julie Markant & Robbie Jacobs

Learning Language Meaning Lexicon S VP NP Bob’s lab produced the All labs lab Language Understanding Understanding spoken language requires that children learn a complex mapping… What is the form of this mapping? How do the demands of learning affect this representation?

Learning Speech Syntax, semantics, pragmatics… Speech Recognition Speech perception and word recognition require mapping… …continuous, variable perceptual input to a something discrete, categorical. What representations mediate acoustics and lexical or sublexical units? How does learning affect this representation?

Overview Overview • Acoustic mappings: Categorical and gradient perception in adults and infants. • Infant speech categories are graded representations of continuous detail. • Statistical learning models and sparse representations. • Conclusions and future directions.

Categorization & Categorical Perception Representation of Speech Detail What is the nature of the mapping between continuous perception and discrete categories? How are these representations sensitive (or not) to within-category detail? • Empirical approach: • Use continuously variable stimuli. • Explore response using • Discrimination Identification (adults) • Habituation (infants)

Categorical Perception 1 100 100 B Discrimination % /p/ Discrimination ID (%/pa/) 0 0 B VOT P • Sharp labeling of tokens on a continuum. P • Discrimination poor within a phonetic category. Categorical Perception Subphonemic within-category variation in VOT is discarded in favor of a discretesymbol (phoneme).

Categorical Perception 2 • Discrimination Task Variations • Pisoni and Tash (1974) • Pisoni & Lazarus (1974) • Carney, Widin & Viemeister (1977) Goodness Ratings Miller (1997) Massaro & Cohen (1983) • Training • Samuel (1977) • Pisoni, Aslin, Perey & Hennessy (1982) Many tasks have demonstrated within-category sensitivity in adults... BUT… And lexical activation shows systematic sensitivity to subphonemic detail (McMurray, Tanenhaus & Aslin, 2002).

Infant Categorical Perception 1 Categorical Perception in Infants Infants have shown a different pattern. For 30 years, virtually all attempts to address this question have yielded categorical discrimination. • Exception: Miller & Eimas (1996). • Only at extreme VOTs. • Only when habituated to non- prototypical token. GWB

Infant Categorical Perception 3 Nonetheless, infants possess abilities that would require within-categorysensitivity. • Infants can use allophonic differences at word boundaries for segmentation (Jusczyk, Hohne & Bauman, 1999; Hohne, & Jusczyk, 1994) • Infants can learn phonetic categories from distributional statistics (Maye, Werker & Gerken, 2002).

Distributional Learning 2 Within a categories, VOT is distributed Gaussian. Result: Bimodal distribution 0ms 40ms VOT Distributional Learning Speech production causes clustering along contrastive phonetic dimensions. E.g. Voicing / Voice Onset Time B: VOT ~ 0 P: VOT ~ 40

Distributional Learning 1 • track frequencies of tokens at each value along a stimulus dimension. • Extract categories from the distribution. +voice -voice frequency 0ms 50ms VOT Distributional Learning To statistically learn speech categories, infants must: • This requires ability to track specific VOTs.

Question 1 Prior examinations of speech-categories used: ? • Habituation • Discrimination not ID. • Possible selective adaptation. • Possible attenuation of sensitivity. • Synthetic speech • Not ideal for infants. • Single exemplar/continuum • Not necessarily a category representation Experiment 1: Reassess this issue with improved methods.

HTPP 1 Head-Turn Preference Procedure • Head-Turn Preference Procedure (Jusczyk & Aslin, 1995) • Infants exposed to a chunk of language: • Words in running speech. • Stream of continuous speech (ala statistical learning paradigm). • Word list. After exposure, memory for exposed items (or abstractions) is assessed by comparing listening time to consistent items with inconsistent items. Misperception 3

HTPP 2 Test trials start with all lights off. Misperception 3

HTPP 2 Center Light blinks. Misperception 3

HTPP 3 Brings infant’s attention to center. Misperception 3

HTPP 3 One of the side-lights blinks. Misperception 3

HTPP 4 Beach… Beach… Beach… When infant looks at side-light… …he hears a word Misperception 3

HTPP 5 …as long as he keeps looking. Misperception 3

Experiment 1 Methods 7.5 month old infants exposed to either 4 b-, or 4 p-words. 80 repetitions total. Form a categoryof the exposed class of words. Bomb Palm Bear Pear Bail Pail Beach Peach Measure listening time on… Original words Bear Pear Competitors Pear Bear VOT closer to boundary Bear* Pear* Experiment 1 Misperception 3

Experiment 1 Stimuli B: M= 3.6 ms VOT P: M= 40.7 ms VOT B*: M=11.9 ms VOT P*: M=30.2 ms VOT B* and P* were judged /b/ or /p/ at least 90% consistently by adult listeners. B*: 97% P*: 96% Stimuli constructed by cross-splicing naturally producedtokens of each end point. Misperception 3

Experiment 1 Familiarity vs. Novelty Novelty Familiarity Within each group will we see evidence for gradiency? B 36 16 P 21 12 Familiarity vs. Novelty Novelty/Familiarity preference variesacross infants and experiments. We’re only interested in the middle stimuli (b*, p*). Infants were classified as noveltyor familiaritypreferring by performance on the endpoints. Misperception 3

Experiment 1 Fam. vs. Nov. 2 Categorical Listening Time Gradient Bear Bear* Pear Gradiency After being exposed to bear… beach… bail… bomb… Infants who show a noveltyeffect… …will look longer for pear than bear. What about in between? Misperception 3

Experiment 1 Results Nov Exposed to: B P Experiment 1 Results Novelty infants (B: 36 P: 21) 10000 9000 8000 Listening Time (ms) 7000 6000 5000 4000 Target Target* Competitor Target vs. Target*: Competitor vs. Target*: p<.001 p=.017

Experiment 1 Results Fam 10000 Exposed to: 9000 B P 8000 Listening Time (ms) 7000 6000 5000 4000 Target Target* Competitor Familiarity infants (B: 16 P: 12) Target vs. Target*: Competitor vs. Target*: P=.003 p=.012

Experiment 1 Results Planned P .009** .009** 10000 Novelty N=21 .024* .024* 9000 8000 Listening Time (ms) 7000 6000 .028* .028* 9000 5000 4000 8000 .018* .018* P P P* P* B B 7000 Listening Time (ms) Familiarity N=12 6000 5000 4000 P P* B Planned Comparisons Infants exposed to /p/ Misperception 3

Experiment 1 Results Planned B >.1 >.2 10000 <.001** <.001** Novelty N=36 9000 8000 Listening Time (ms) 7000 6000 .06 10000 5000 .15 9000 4000 B B* P 8000 Listening Time (ms) 7000 Familiarity N=16 6000 5000 4000 B B* P Infants exposed to /b/ Misperception 3

Experiment 1 Conclusions Experiment 1 Conclusions Contrary to all previous work: • 7.5 month old infantsshow gradient sensitivityto subphonemic detail. • Clear effect for /p/ • Effect attenuated for /b/. Misperception 3

Experiment 1 Conclusions 2 Null Effect? Listening Time Bear Bear* Pear Expected Result? Listening Time Bear Bear* Pear Reduced effect for /b/… But: Misperception 3

Experiment 1 Conclusions 3 Actual result. Listening Time Bear Bear* Pear • Bear*  Pear • Category boundary lies between Bear & Bear* • Between (3ms and 11 ms). • Will we see evidence for within-category sensitivity with a different range? Misperception 3

Experiment 2 Test: Bomb Bear Beach Bale -9.7 ms. Bomb* Bear* Beach* Bale* 3.6 ms. Palm Pear Peach Pail 40.7 ms. Same design as experiment 1. VOTs shifted away from hypothesized boundary (7 ms). Train Misperception 3

Experiment 2 Results Fam Experiment 2 Results Familiarity infants (34 Infants) =.01** 9000 =.05* 8000 7000 Listening Time (ms) 6000 5000 4000 B- B P Misperception 3

Experiment 2 Results Nov Experiment 2 Results Noveltyinfants (25 Infants) =.002** 9000 =.02* 8000 7000 Listening Time (ms) 6000 5000 4000 B- B P Misperception 3

Experiment 2 Conclusions Adult boundary /b/ /p/ Adult Categories Category Mapping Strength VOT Experiment 2 Conclusions • Within-category sensitivity in /b/ as well as /p/. • Shifted category boundary in /b/: not consistent with adult boundary (or prior infant work). Why? Misperception 3

Experiment 2 Conclusions 2 /b/ results consistent with (at least) two mappings. Adult boundary /b/ /p/ 1) Shifted boundary Category Mapping Strength VOT • Inconsistent with prior literature. • Why would infants have this boundary? Misperception 3

Experiment 2 Conclusions 3 Adult boundary unmapped space /b/ /p/ Category Mapping Strength 2) Sparse Categories VOT HTPP is a one-alternative task. Asks: B or not-B not: B or P Sparse categories may in fact by a by-product of efficient statistical learning. Misperception 3

Model Intro  3) Each Gaussian has three parameters:   VOT Computational Model Distributional learning model • Model distribution of tokens as • a mixture of gaussian distributions • over phonetic dimension (e.g. VOT) . 2) After receiving an input, the Gaussian with the highest posterior probability is the “category”. Misperception 3

Model Intro 2 VOT VOT Statistical Category Learning 1) Start with a set of randomly selected Gaussians. • After each input, adjust each parameter to find best description of the input. • Start with more Gaussians than necessary • model doesn’t innately know how many categories. •  -> for unneeded categories. Misperception 3

Model Intro 3 Misperception 3

Model Overgen • Overgeneralization • large  • costly: lose phonetic distinctions… Misperception 3

Model Undergen • Undergeneralization • small  • not as costly: maintain distinctiveness. Misperception 3

Model err on side of caution • To increase likelihood of successful learning: • err on the side of caution. • start with small  1 0.9 0.8 0.7 0.6 2 Category Model P(Success) 0.5 3 Category Model 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 Starting 

Model Sparseness Small  Unmapped space VOT Starting  0.4 .5-1 0.35 0.3 0.25 Avg Sparsity Coefficient 0.2 0.15 0.1 0.05 0 0 2000 4000 6000 8000 10000 12000 Training Epochs Sparseness coefficient: % of space not mapped to any category.

Model Sparseness 2 .5-1 20-40 Sparseness coefficient: % of space not mapped to any category. VOT Starting  0.4 0.35 0.3 0.25 Avg Sparsity Coefficient 0.2 0.15 0.1 0.05 0 0 2000 4000 6000 8000 10000 12000 Training Epochs

Model Sparseness 3 .5-1 12-17 20-40 3-11 Sparseness coefficient: % of space not mapped to any category. VOT Starting  0.4 0.35 0.3 0.25 Avg Sparsity Coefficient 0.2 0.15 0.1 0.05 0 0 2000 4000 6000 8000 10000 12000 Training Epochs

Model Conclusions Model Conclusions To avoid overgeneralization… …better to start with small estimates for  • Small starting ’s lead to sparse category structure during infancy—much of phonetic space is unmapped. • Occasionally model leaves sparse regions at the end of learning. • 1) Competition/Choice framework: • Additional competition or selection mechanisms during processing allows categorization on the basis of incomplete information.

Model Conclusions 2 Categories • Competitive Hebbian Learning (Rumelhart & Zipser, 1986). VOT • 2) Non-parametric models • Not constrained by a particular equation—can fill space better. • Similar properties in terms of starting  and the resulting sparseness.

Conclusions 3 Final Conclusions Infants show graded response to within-category detail. • /b/-results suggest regions of unmapped phonetic space. • Statistical approach provides support for sparseness. • Given current learning theories, sparseness results from optimal starting parameters. • Empirical test will require a two-alternative task. • AEM: train infants to make eye-movements in response to stimulus identity.

Future Work Future Work • Infants make anticipatory eye-movements along predicted trajectory, in response to stimulus identity. • Two alternatives allows us to distinguish between category boundary and unmapped space.

Last Word -60 -40 -20 0 20 40 60 80 VOT The last word • Early speech categories emerge from an interplay of • Exquisite sensitivity to graded detail in the signal. • Long-term sensitivity to statistics of the signal. • Early biases to optimize the learning problem.

Title Slide

Title Slide

Presentation Transcript

Title Slide

Title Slide

Title Slide

Title Slide

Title Slide

Title Slide

Title Slide

Slide Title

Title Slide

Title Slide

Title slide

Title Slide

Title Slide

Title slide

Title Slide

Title Slide

Title Slide

Title Slide

Title Slide

Title Slide

Title Slide

Slide title