1 / 43

Modeling infant word segmentation: Another example of discovery fueled by CHILDES

Explore the discovery of infant word segmentation using modeling techniques and analyze segmentability differences in child-directed versus adult-directed speech in various settings. Consider the implications for infant studies.

gilbertr
Download Presentation

Modeling infant word segmentation: Another example of discovery fueled by CHILDES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling infant word segmentation: Another example of discovery fueled by CHILDES Alejandrina Cristia Laboratoire de Sciences Cognitives et Psycholinguistique @Language Emergence: Competition, Usage, and Analyses, 2019-06-06

  2. No overt & unambiguous word/morpheme boundaries in the input… “no silences” Kuhl 2004

  3. “no silences” …yet by the end of the first year, infants know somewords/morphemes ‘Feet’ ‘mommy’ ‘baby’ ‘alldone’ ‘tobed’ Kuhl 2004 Tincoff & Jusczyk 2012; Bergelson & Swingley 2012; Ngon et al. 2014

  4. How to study segmentability? mommy talking …cute …something shiny go by? Let’sjustget to the facts.

  5. Today’s menu • A methodology for studying word form segmentation using models • Segmentability differences forchild-directed versus adult-directedregister (in French) • …bilingual versus monolingual settings (English, Spanish, & Catalan) • Implications for infant studies

  6. Today’s menu • A methodology for studying word form segmentation using models • Segmentability differences forchild-directed versus adult-directedregister (in French) • …bilingual versus monolingual settings (English, Spanish, & Catalan) • Implications for infant studies

  7. Input representation Acoustic Symbolic (‘Phonological text’) + lots of corpora can be used + lots of algorithms proposed + algorithms represent a wide range of strategies assumes babies represent input abstract, with zero errors + realistic… • …provided representations match babies’ • few appropriate corpora (natural discourse & good quality audio) • only one (reproducible) algorithm

  8. Example *MOT: look at the doggie Phonologize lUk At D2 dOgi Removewordboundaries & unitize Precision = 1 of the 5 wordsfoundwerewords in the input = .2 l U k A t D 2 d O g i Recall = 1 of the 4 words in the input wasrecovered = .25 Evaluate Segment withsomealgorithm 2* (Precision * Recall) Token F-score = Precision + Recall lUkAtD2dOgi Note -- one can also unitize at the syllable level: lUk At D2 dOgi(input) lUk At D2dOgi(output)

  9. Example algorithms • Every sentence is a word (SentBase) • Every syllable is a word (SyllBase) Simplest strategies 1. Baseline Lignos 2012 TP_abs TP_rel Goal is to “cut” using local cues • Transitional Probabilities (TP) x Absolute/Relative threshold 2. Sub-lexical • Diphone-Based Segmentation (DiBS) Daland + 2009; Saksida + 2016 Goal is to learn a set of “minimal recombinable units” 3. Lexical • Adaptor Grammar (AG) • Phonotactics from Utterances Determine Distributional Lexical Elements (Puddle) Johnson + 2007; Monaghan + 2010 Package: wordseg.readthedocs.io Preprint: https://osf.io/nx49h/ Bernard et al. 2019 BehResMeth

  10. The process in WordSeg Package: wordseg.readthedocs.io Preprint: https://osf.io/nx49h/ Bernard et al. 2019 BehResMeth

  11. Sample results:precision, recall, & F-score are correlated Providence corpus (Demuth, Culbertson, & Alter, 2006) on CHILDES

  12. Naima, in Providence corpus (Demuth, Culbertson, & Alter, 2006) on CHILDES Sample results:Effects of algorithm and input represent-ation

  13. Today’s menu • A methodology for studying word form segmentation using models • Segmentability differences forchild-directed versus adult-directedregister (in French) • …bilingual versus monolingual settings (English, Spanish, & Catalan) • Implications for infant studies

  14. Why look at register? In child-directed speech, probably… • More utterances consist of a single word (+ all models) • Utterances are overall shorter in length (+ all models) *MOT: Attends! *MOT: Ouaistuvastemettreausoleilpourtesecherlescheveux!

  15. Why look at register? In child-directed speech, probably… • More utterances consist of a single word (+ all models) • Utterances are overall shorter in length (+ all models) • Utterances are more repetitious (+? lexical models) • *MOT: coucoucoucousitufaisaisdespetitssourirestoi. • *MOT: tumefaisdespetitssouriresXXXcoucoumongrand. • *MOT: coucoutumefaisdessouriresoupas.

  16. (Ask me about crosslinguistic extensions if curious!) Japanese English French • Riken corpus • Collected in the lab  adult-directed speech is with experimenter • Winnipeg corpus • Collected with child-worn device worn whole day  adult-directed speech is among caregivers • LENA-Lyon corpus (LeNormand et al. HomeBank) • Collected with child-worn device worn whole day  adult-directed speech is among caregivers BogdanLudusan Georgia Loukatou

  17. French“wild” ADS on Le Normand, Canault, & Van Thai’s LENA-Lyon corpus Loukatou + 2019 Proc Cog Sci

  18. CDS-ADS: Conclusions • Overall trend for better performance for child- than adult-directed speech • But: • reversed for some algorithms • effect of register < 15% • (in the best controlled cases, 2%)

  19. Today’s menu • A methodology for studying word form segmentation using models • Segmentability differences forchild-directed versus adult-directedregister (in French) • …bilingual versus monolingual settings (English, Spanish, & Catalan) • Implications for infant studies

  20. Bilingualsneed to: Learn words, likemonolinguals do, butin twolanguages Overall less input in each language ‘Feet’ ‘mommy’ ‘baby’ ‘alldone’ ‘tobed’ Why study word segmentation in a bilingual setting? ‘pié’ ‘mamá’ ‘bebé’ … Hoff + 2012 Fibla & Cristia (submitted very soon, I hope)

  21. Questions & predictions • Are segmentation strategies equally successful when applied to bilingual and monolingual corpora? → Measure the performance of previously studied segmentation algorithms in a controlledmonolingual versus bilingual corpus. • Possible outcomes: • The confusion hypothesis: variable and inconsistent input → Poorer performance for the bilingual than for the monolingual • The resistant hypothesis:(if switchingonly at utteranceedges) local statistical and lexical are stillreliable → Similar performance for the bilingual and the monolingual Fibla & Cristia (submitted very soon, I hope)

  22. Creating bilingual corpora

  23. Three cases of bilingual < monolingual

  24. Three cases of bilingual < monolingual 11 cases of bilingual ‘in between’ monolingual

  25. Today’s menu • A methodology for studying word form segmentation using models • Segmentability differences forchild-directed versus adult-directedregister (in French) • …bilingual versus monolingual settings (English, Spanish, & Catalan) • Implications for infant studies

  26. Effects of algorithm and input represent-ation size of algorithm x level effect = 40-60%? Cristia + 2019 Open Mind

  27. Effect of register Size of register effect < 10%? on LENA-Lyon corpus Loukatou + 2019 Proc Cog Sci

  28. Effect of bilingualism Size of bilingualism effect ~ 0%? Fibla & Cristia (submitted very soon, I hope)

  29. Today’s menu • A methodology for studying word form segmentation using models • Segmentability differences as a function of language properties • …child-directed versus adult-directedregister (in Japanese, English, & French) • …bilingual versus monolingual settings (English, Spanish, & Catalan) • Implications for infant studies

  30. What may babies be doing? Using CDI results & frequency effects Larsen + 2017 Interspeech & in prep

  31. What may babies be doing? Using CDI results & frequency effects Coefficient of determination R2=.1 Larsen + 2017 Interspeech & in prep

  32. phoneme-based models

  33. syllable-based models phoneme-based models

  34. Cut only at utterance edges  frequency of words in isolation

  35. To be continued…

  36. Thanks to... Familieswhoagree to berecorded & for their data to beshared Researcherswho record them and share on TalkBank TalkBank~ Brian MacWhinney &you!

  37. Japanese“lab” ADS on Reiko Mazuka’s RIKEN corpus much of this is in Ludusan et al. 2017 ACL (now working on journal paper with more material)

  38. English“wild” ADS on Melanie Soderstrom’s Winnipeg corpus Cristia + 2019 Open Mind

More Related