180 likes | 287 Views
Backchannel-Inviting Cues in Task-Oriented Dialogue. Agust ín Gravano 1,2 Julia Hirschberg 1. Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina. Introduction. Interactive Voice Response Systems. Quickly spreading. Mostly simple functionality.
E N D
Backchannel-Inviting Cuesin Task-Oriented Dialogue Agustín Gravano1,2 Julia Hirschberg1 • Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina
Introduction Interactive Voice Response Systems • Quickly spreading. Mostly simple functionality. • “Uncomfortable”, “awkward”. • ASR+TTS account for most IVR problems. • As ASR and TTS improve, other problems revealed. • Coordination of system-user exchanges. • Backchannels. Agustín Gravano Interspeech 2009
Introduction Backchannels • Short expressions uttered by listeners to: • Convey that they are paying attention. • Encourage the speaker to continue. • Examples: okay, uh-huh, mm-hm, alright. • Very frequent in task-oriented dialogue. • Thus, modeling human usage of BC should lead to an improved system-user coordination. Agustín Gravano Interspeech 2009
Introduction Goal • Learn when backchannels are likely to occur. • Find “backchannel-inviting” cues. • Cues displayed by the speaker “inviting” the listener to produce a backchannel response. • This could improve the coordination of IVRs: • Speech understanding: Detect points in the user’s turn where a backchannel would be welcome. • Speech generation: Display cues inviting the user to produce a backchannel. Agustín Gravano Interspeech 2009
Talk Outline • Previous work • Material • Method • Results • Conclusions Agustín Gravano Interspeech 2009
Backchannel-Inviting Cues Previous Work • Duncan 1972, 1973, 1974, inter alia. • Hypothesized six turn-yielding cues in face-to-face dialogue. • Several studies continued this line of research, but always excluded backchannels. • Ward & Tsukahara 2000. • Region of low pitch lasting 110ms or more. • Cathcart et al. 2003. • Language model based on pause duration and part-of-speech tags to predict the location of BC. Agustín Gravano Interspeech 2009
Material Columbia Games Corpus • 12 task-oriented spontaneous dialogues. • Standard American English. • 13 subjects: 6 female, 7 male. • Series of collaborative computer games. • No eye contact. No speech restrictions. • 9 hours of dialogue. • Manual orthographic transcription, alignment. • Manual prosodic annotations (ToBI). Agustín Gravano Interspeech 2009
Material Columbia Games Corpus Player 1: Describer Player 2: Follower Agustín Gravano Interspeech 2009
Backchannel-Inviting Cues • Cues displayed by the speaker “inviting” the listener to produce a backchannel response. Agustín Gravano Interspeech 2009
Hold Backchannel IPU4 IPU1 IPU2 Speaker A: IPU3 Speaker B: Backchannel-Inviting Cues Method • IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. • 3 trained annotators identified Backchannels using a labeling scheme described in [Gravano et al. 2007]. • To find BC-inviting cues, we compare: • IPUs preceding Holds, • IPUs preceding Backchannels. Agustín Gravano Interspeech 2009
Backchannel-Inviting Cues Individual Cues • Final rising intonation: 81% of IPUs before BC end in H-H% or L-H%. • Higher pitch level. • Higher intensity level. • Lower NHR (voice quality). • Longer IPU duration (seconds, #words). • Final POS bigram: 72% of IPUs before BC end in DT NN, JJ NN, or NN NN. } • entire IPU • final 1.0 sec • final 0.5 sec Agustín Gravano Interspeech 2009
Backchannel-Inviting Cues Defining Presence of a Cue • 2 representative features for each cue: • Define presence/absence based on whether the value is closer to the mean before BC or H. Agustín Gravano Interspeech 2009
Top Frequencies of Complex Cues digit == cue present dot == cue absent BC-inviting cues: 1: Final intonation 2: Intensity level 3: Pitch level 4: IPU duration 5: Voice quality 6: Final POS bigram Agustín Gravano Interspeech 2009
Backchannel-Inviting Cues Combined Cues Percentage of IPUs followed by a BC r2=0.993 Number of cues conjointly displayed Agustín Gravano Interspeech 2009
Backchannel-Inviting Cues IVR Systems • After each IPU from the user: if estimated likelihood > thresholdthen produce a backchannel • To elicit a backchannel from the user, if desired: Include as many cues as possible in the system’s final IPU. Agustín Gravano Interspeech 2009
Summary • Study of backchannel-inviting cues. • Objective, automatically computable. • Combined cues. • Improve turn-taking decisions of IVR systems. • Results drawn from task-oriented dialogues. • Not necessarily generalizable. • Suitable for most IVR domains. • SIGdial 2009: Study of turn-yielding cues. Agustín Gravano Interspeech 2009
Special thanks to… • My advisor, Julia Hirschberg • Thesis Committee Members • Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. • Speech Lab at Columbia University • Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. • Collaborators • Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, Lauren Wilcox. Agustín Gravano Interspeech 2009