1 / 40

Speaking More Like You : Lexical, Acoustic/Prosodic, and Discourse Entrainment in Spoken Dialogue Systems

Speaking More Like You : Lexical, Acoustic/Prosodic, and Discourse Entrainment in Spoken Dialogue Systems. Julia Hirschberg Columbia University SIGdial 2008. Entrainment/Adaptation/Accommodation/Alignment/Priming. Hypothesis:

rebekah
Download Presentation

Speaking More Like You : Lexical, Acoustic/Prosodic, and Discourse Entrainment in Spoken Dialogue Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaking More Like You: Lexical, Acoustic/Prosodic, and Discourse Entrainment in Spoken Dialogue Systems Julia Hirschberg Columbia University SIGdial 2008

  2. Entrainment/Adaptation/Accommodation/Alignment/Priming • Hypothesis: • People tend to adapt their communicative behavior to that of their conversational partner • Consequences • Key to successful communication (Goleman ‘06) • Entrainment leads subjects to like their conversational partners more and to perceive conversations as more successful (Chartrand & Bargh ‘99) • Reitter et al ’07 found entrainment a good predictor of task success in Map Task

  3. Dimensions of Entrainment • Lexical and syntactic: Collaborating on referential choice • A: It’s that thing that looks like a harpsichord. • B: So the harpsichord-looking thing… • B: The harpsichord… • Semantic (Mills & Healey, yesterday) • Phonological • Word pronunciation/accent (e.g. Oprah) • Acoustic/Prosodic • Speaking rate, pitch range, intensity, contour, voice quality • Socio-cultural dimensions

  4. Facial expression and gesture • Who studies? • Linguists, Psycholinguists, Sociolinguists, Computational Linguists (Spoken Dialogue Systems researchers) • Research questions • What are the dimensions of entrainment? • How can we measure it? • Does everyone entrain? Along the same dimensions? • What are the consequences of entrainment? Non-entrainment? Dis-entrainment? • What types of entrainment can and should be modeled in SDS?

  5. Outline • Previous work • Lexical • Acoustic-prosodic • Discourse/social level • SDS • Entrainment in the Columbia Games Corpus • The Corpus • New approaches to lexical entrainment • Pilot studies • Conclusions and Future Directions

  6. Outline • Previous work • Lexical • Acoustic-prosodic • Discourse/social level • SDS • Entrainment in the Columbia Games Corpus • The Corpus • New approaches to lexical entrainment • Pilot studies • Conclusions and Future Directions

  7. Lexical Entrainment • Gricean prediction for choice of referring expression • People use descriptions that minimally but effectively distinguish among items in the discourse – Maxim of Quantity • Garrod & Anderson ’87 Output/Input Principle • Conversational partners formulate their current utterance according to the model used to interpret their partner’s most recent utterance • Clark, Brennan, et al’s Conceptual Pacts • People make Conceptual Pacts wrt appropriate referring expressions with particular conversational partners • Reluctant to abandon these even when shorter expressions would be sufficient (e.g. ‘red car’ even no other cars visible)

  8. Entrainment in Acoustic/Prosodic Dimensions • How are speech timing and voice quality affected by an (unfamiliar) conversational partner? (Sherblom & La Riviere ’87) • Study: • 65 ugrad pairs asked to discuss a ‘problem situation’ together • Utter a single sentence before and after the conversation • Sentences compared for speaking rate, utterance length and vocal jitter • Results: • Substantial influence of partner on all 3 measures • Gender, interpersonal uncertainty and differences in arousal influenced degree of adaptation

  9. How early do we start to entrain? • Do children entrain to their mother’s speaking rate? (Guitar & Marchinkoski ’01) • Study: • 6 mothers with own (`normal’) 3-yr-olds (3M, 3F) • Mothers’ speaking rate significantly reduced (B) or not (A) in A-B-A-B design • Results: • 5/6 children reduced rate when their mothers spoke more slowly

  10. Do humans adapt to the behavior of non-human partners? (Coulston et al ’02) • Study: • 24 7-10-yr olds interacted with an extroverted, loud animated character and with an introverted, soft character (TTS voices) • Multiple tasks using different amplitude ranges and response latencies • Results: • 79-94% of children adapted their amplitude, bi-directionally • Also adapted response latencies (mean 18.4%), bidirectionally

  11. Social Entrainment • Do speakers adapt to the style of other social classes? (Azuma ’97) • Study: Emperor Hirohito visits the countryside • Corpus-based study of speech style of Japanese Emperor Hirohito during chihoo jyunkoo (`visits to countryside‘), 1946-54 • Published transcripts of speeches • Findings: • Emperor Hirohito converged his speech style to that of listeners lower in social status • Choice of verb-forms, pronouns no longer those of person with highest authority • Perceived as like those of a (low-status) mother

  12. Do speakers adapt in cultural markers? (Roth ’05): • Context • High school in NE with predominantly African-American student body • Co-teachers: • Cristobal: Cuban-African-American teacher • Chris: new Italian-American teacher • Adaptation of Chris to Cristobal • Catch phrases (e.g. right!, really really hot) and their production: pitch and intensity contours • Pitch ‘matching’ across speakers • Mimesis vs entrainment

  13. Entrainment in SDS • If users entrain to systems, systems can • Predict vocabulary better and improve recognition • Influence other user behavior such as speaking rate or amplitude to improve recognition • If systems entrain to users they might • Improve task performance • Enhance user satisfaction • Is entrainment feasible, given current technology?

  14. KTH’s Waxholm System

  15. Verb Priming: How often do you go abroad on holiday? Hur ofta åker du utomlands på semestern? Hur ofta reser du utomlands på semestern? jag reser en gång om året utomlands jag reser inte ofta utomlands på semester det blir mera i arbetet jag reserreser utomlands på semestern vartannat år jag reser utomlands en gång per semester jag reser utomlands på semester ungefär en gång per år jag brukar resa utomlands på semestern åtminståne en gång i året en gång per år kanske en gång vart annat år varje år vart tredje år ungefär nu för tiden inte så ofta varje år brukar jag åka utomlands jag åker en gång om året kanske jag åker ganska sällan utomlands på semester jag åker nästan alltid utomlands under min semester jag åker ungefär 2 gånger per år utomlands på semester jag åker utomlands nästan varje år jag åker utomlands på semestern varje år jag åker utomlands ungefär en gång om året jag är nästan aldrig utomlands en eller två gånger om året en gång per semester kanske en gång per år ungefär en gång per år åtminståne en gång om året nästan aldrig

  16. CMU’s Let’s Go Lab

  17. Systems Entraining to Users • Let’s Goadapts confirmation prompts to speech of non-native users, finding the closest match to user input in its own grammar and lexicon (Raux & Eskenazi 2004)

  18. Outline • Previous work • Lexical • Acoustic-prosodic • Discourse/social level • SDS • Entrainment in the Columbia Games Corpus • The Corpus • New approaches to lexical entrainment • Pilot studies • Conclusions and Future Directions

  19. Entrainment in the Columbia Games Corpus • Joint work with Agus Gravano and Ani Nenkova (ACL 2008) • Corpus-based approach to multiple dimensions of entrainment • Questions: • What types of entrainment occur? • How should we measure entrainment? • What are the consequences of entrainment? Of dis-entrainment? • How much should/can entrainment be modeled in SDS?

  20. The Columbia Games Corpus • 12 spontaneous task-oriented dyadic conversations (9h 8m speech) • 2 subjects play series of computer games, no eye contact (45m 39s mean session time) • 2 sessions per subject, w/different partners • Multiple games and types • Recorded on separate channels in soundproof booth, digitized and downsampled to 16k • All user and system behaviors logged

  21. Cards Game #1  Player 1 (Describer)  Player 2 (Searcher) • Short monologues • Vary frequency and order of occurrence of objects on the cards.

  22. Cards Game #2  Player 1 (Describer)  Player 2 (Searcher) • Dialogue • Vary frequency and order of occurrence of objects on the cards across speakers.

  23. Follower must place the target object where it appears on the Describer’s screen solely via the description provided (4h 19m) Objects Game Follower: Describer:

  24. Annotation • Orthographic transcription and alignment (~73k words) • Intonation, using ToBI conventions • Laughs, coughs, breaths, smacks, throat-clearings. • Self-repairs • Function (10 categories) of affirmative cue words (alright, mm-hm, okay, right, uh-huh, yeah, yes, …) • Question form and function • Turn-taking behaviors

  25. Entrainment in Referring Expressions S13: the orange M&M looking kind of scared and then a one on the bottom left and a nine on the bottom right S12: alright I have the exact same thing I just had it's an M&M looking scared that's orange S13: yeah the scared M&M guy yeah S12: framed mirror and the scared M&M on the lower right S13: and it's to the right of the scared M&M guy S13: yeah and the iron should be on the same line as the frightened M&M kind of like an L S12: to the left of the scared M&M to the right of the onion and above the iron

  26. Entrainment and High-Frequency Words • Lexical entrainment: agreeing on a common vocabulary (Niederhoffer & Pennebaker ’02) • How does entrainment to another’s use of HFWs (the N most common words in a corpus) affect task success in dialogue? • Data: Columbia Games Corpus subset • 48 tasks • Recall that: • Game scores available for each game • Labeled for cue phrases, turn-taking, and other behaviors

  27. Experiments • Does entrainment in use of 25 most frequent words in the corpus (HFW-C)? the game (HFW-G)? correlate with task success as defined by game scores? • Does entrainment in use of affirmative cue words (ACW) correlate with task success? • Does entrainment in use of filled pauses (FP) correlate with task success? • Are dialogues smoother, more coordinated when entrainment occurs?

  28. Entrainment Metric I Where fraction(w, Si)  Fraction of times Speaker iused word win the conversation Range: -1 (no entr.) to 0 (complete entr.) Generalize to word classes: Where c = Word class

  29. Entrainment Metric II Where countSi(w)= No. of times Si used word w in the conversation Range: -1 (no entr.) to 0 (complete entr.)

  30. Correlations with Game Score • NHFW-X: # High Frequency Words in Corpus (C), Game (G) • ACW: Affirmative Cue Words

  31. Correlations with Game Score • NHFW-X: # High Frequency Words in Corpus (C), Game (G) • ACW: Affirmative Cue Words

  32. Correlations with Game Score • NHFW-X: # High Frequency Words in Corpus (C), Game (G) • ACW: Affirmative Cue Words • No correlation for filled pauses

  33. Entrainment & Dialogue Coordination • Are dialogues more coordinated when entrainment occurs? • Columbia Games Corpus • Labeled for type of turn exchange (Beattie, 1982), including: • Smooth Switch: S2 starts his turn after S1 has finished hers • Interruption: S2 starts his turn before S1 has finished hers • Overlap: S2 starts his turn just before S1 has finished hers, but S1 finishes her turn

  34. Significant Correlations (p<.05) ENTR1(ACW) & Prop. of Overlaps (cor = 0.64) ENTR2(ACW) & Prop. of Overlaps (cor = 0.61) ENTR2(ACW) & Mean Latency of Smooth Switches (cor = – 0.76) ENTR2(25MF-G) & Prop. of Overlaps (cor = 0.60) ENTR1(25MF-C) & Prop. of Interruptions (cor = – 0.61) • HFW/ACW entrainment positively correlated with more overlaps, fewer interruptions, and shorter inter-turn latencies

  35. Acoustic/Prosodic Pilot Studies: Speaking Rate • Do speakers entrain in (mean) speaking rate to their (different) partners?

  36. Acoustic/Prosodic Pilot Studies: Speaking Rate • Do speakers entrain in (mean) speaking rate to their (different) partners?

  37. Outline • Previous work • Lexical • Acoustic-prosodic • Discourse/social level • SDS • Entrainment in the Columbia Games Corpus • The Corpus • New approaches to lexical entrainment • Pilot studies • Conclusions and Future Directions

  38. Current/Future Work • Examine additional dimensions of entrainment and how they correlate with task success, perceived naturalness • Acoustic/prosodic • Pitch, intensity, rate measures • Voice quality, contour • Discourse: • Do people entrain in styles of topic shift, use of cue phrases, turn-taking behaviors? Laughter, disfluencies • Personality? • Explore additional measures of entrainment

  39. Experiment with system entrainment to users in SDS

  40. Thank you!

More Related