Adapting and Learning Dialogue Models

Adapting and Learning Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006

Roadmap • The Problem: Portability • Task domain: Call-routing • Porting: • Speech recognition • Call-routing • Dialogue management • Conclusions • Learning DM strategies • HMMs and POMDPs

SLS Portability • Spoken language system design • Record or simulate user interactions • Collect vocabulary, sentence style, sequence • Transcribe/label • Expert creates vocabulary, language model, dialogue model • Problem: Costly, time-consuming, expert

Call-routing • Goal: Given an utterance, identify type • Dispatch to right operator • Classification task: • Manual rules or data-driven methods • Feature-based classification (Boosting) • Pre-defined types, e.g.: • Hello? -> Hello; I have a question -> request(info) • I would like to know my balance. > request(balance)

Dialogue Management • Flow Controller • Pluggable dialogue strategy modules • ATN: call-flow, easy to augment, manage context • Inputs: context, semantic rep. of utterance • ASR • Language models • Trigrams, in probabilistic framework

Adaptation: ASR • ASR: Language models • Usually trained from in-domain transcriptions • Here: out-of-domain transcriptions • Switchboard, spoken dialog (telecomm, insur) • In-domain web pages • New domain: pharmaceuticals • Style differences: SLS:pronouns; OOV: med best • Best accuracy: spoken dialogue+web • SWBD too big/slow

Adaptation: Call-routing • Manual tagging: Slow, expensive • Here: Existing out-of-domain labeled data • Meta call-types: Library • Generic: all apps • Re-usable: in-domain, but already exist • Specific: only this app • Grouping done by experts • Bootstrap: Start with generic, reusable

Call-type Classification • Boostexter: word n-gram features; 1,100 iter • ASR output basis • Telecomm based call-type library • Two classifications: reject-yn; classification • In-domain: true: 78%; ASR: 62% • Generic: test on generic: 95%; 91% • Bootstrap: generic+reuse+rules: 79%, 68%

Dialogue Model • Build dialogue strategy templates • Based on call-type classification • Generic: • E.g.. Yes, no, hello, repeat, help • Cause generic context dependent reply • Tag as vague/concrete: • Vague: “I have a question” -> clarification • Concrete:clear routing, attributes – sub-dialogs

Dialogue Model Porting • Evaluation: • Compare to original transcribed dialogue • Task 1: DM category: 32 clusters of calls • Bootstrap 16 categories – 70% of instances • Using call-type classifiers: get class, conf, concrete? • If confident/concrete/correct -> correct; • If incorrect, error • Also classify vague/generic • 67-70% accuracy for DM, routing task

Conclusions • Portability: • Bootstrapping of ASR, Call-type, DM • Generally effective • Call-type success high • Others: potential

Learning DM Strategies • Prior approaches: • Hand-coded: state-, frame- or agent-based • Adaptation bootstraps from existing structure • Alternative: • Capture prior interaction patterns • Learn dialogue structure and management

Training HMM DM • Construct training corpus • E.g. Record human-human interactions • Identify and label states • Train HMM dialogue management • Use tagged sequences to learn • Correspondences between utterances and states • State transition probabilities • Effective, still requires initial tagging

Reinforcement Learning • Model dialogues with (partially observable) Markov decision processes • Users form stochastic env, • Actions are system utterances, • State is dialogue so far • Goal: maximize some utility measure • Task completion/user satisfaction • Learn policy – implemented as actions in state • That optimizes utility measure

Applications • Toot – train information • Litman, Kearns, et al • Learned different initiative/confirmation strategies • Air travel bookings (Young et al 2006) • Problem: huge number of possible states • More airports, dramatically more possible utts • Approach: Collapse all alternative slot fillers • Represent with single default

Turn-taking Discourse and Dialogue CS 35900-1 November 16, 2004

Agenda • Motivation • Silence in Human-Computer Dialogue • Turn-taking in human-human dialogue • Turn-change signals • Back-channel acknowledgments • Maintaining contact • Exploiting to improve HCC • Automatic identification of disfluencies, jump-in points, and jump-ins

Turn-taking in HCI • Human turn end: • Detected by 250ms silence • System turn end: • Signaled by end of speech • Indicated by any human sound • Barge-in • Continued attention: • No signal

Yielding & Taking the Floor • Turn change signal • Offer floor to auditor/hearer • Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause • Likelihood of change increases with more cues • Negated by any gesticulation • Speaker-state signal: • Shift in head direction AND/OR Start of gesture

Retaining the Floor • Within-turn signal • Still speaker: Look at hearer as end clause • Continuation signal • Still speaker: Look away after within-turn/back • Back-channel: • ‘mmhm’/okay/etc; nods, • sentence completion. Clarification request; restate • NOT a turn: signal attention, agreement, confusion

Improving Human-Computer Turn-taking • Identifying cues to turn change and turn start • Meeting conversations: • Recorded, natural research meetings • Multi-party • Overlapping speech • Units = “Spurts” between 500ms silence

Tasks • Sentence/disfluency/non-boundary ID • End of sentence, break off, continue • Jump-in points • Times when others “jump in” • Jump-in words • Interruption vs start from silence • Off- and online • Language model and/or prosodic cues

Text + Prosody • Text sequence: • Modeled as n-gram language model • Hidden event prediction – e.g. boundary as hidden state • Implement as HMM • Prosody: • Duration, Pitch, Pause, Energy • Decision trees: classify + probability • Integrate LM + DT

Interpreting Breaks • For each inter-word position: • Is it a disfluency, sentence end, or continuation? • Key features: • Pause duration, vowel duration • 62% accuracy wrt 50% chance baseline • ~90% overall • Best combines LM & DT

Jump-in Points • (Used) Possible turn changes • Points WITHIN spurt where new speaker starts • Key features: • Pause duration, low energy, pitch fall • No lexical/punctuation features used • Forward features useless • Look like SB but aren’t • Accuracy: 65% wrt 50% baseline • Performance depends only on preceding prosodic features

Jump-in Features • Do people speak differently when jump-in? • Differ from regular turn starts? • Examine only first words of turns • No LM • Key features: • Raised pitch, raised amplitude • Accuracy: 77% wrt 50% baseline • Prosody only

Summary • Prosodic features signal conversational moves • Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation • Jump-ins occur at locations that sound like sent. ends • Raise voice when jump in

Adapting and Learning Dialogue Models

Adapting and Learning Dialogue Models

Presentation Transcript

‘LEARNING AND ADAPTING: COUNTERINSURGENCY IN AFGHANISTAN’

Dialogue Models for Virtual Humans

Adopting and adapting teaching and learning styles

Learning to Industrialize Policy Learning and Policy Dialogue

Learning Exchanges: Models and Approaches

Facilitating Student Learning Through Discussion and Dialogue

Learning Analytics and Exploratory Dialogue

Intercultural Learning And Intercultural Dialogue

Adopting and adapting teaching and learning styles

POLICY DIALOGUE LEARNING EVENT

Learning Dialogue

Learning Dialogue

Internationalisation: learning and adapting

Learning Models

Learning Differences and adapting for all

Adapting a Learning 2.0 Program

Adapting To Online Learning

Adapting to User Affect in a Spoken Dialogue System

Spoken Dialogue Systems and the Learning Sciences

Adopting and adapting teaching and learning styles

Learning Models