230 likes | 396 Views
Dialog Management for Rapid-Prototyping of Speech-Based Training Agents. Victor Hung , Avelino Gonzalez, Ronald DeMara University of Central Florida. Agenda. Introduction Approach Evaluation Results Conclusions. Introduction. General Problem
E N D
Dialog Management for Rapid-Prototyping of Speech-Based Training Agents Victor Hung, Avelino Gonzalez, Ronald DeMara University of Central Florida
Agenda • Introduction • Approach • Evaluation • Results • Conclusions
Introduction • General Problem • Elevate the level of speech-based discourse to a new level of naturalness in Embodied Conversation Agents (ECA) carrying an open-domain dialog • Specific Problem • Overcome Automatic Speech Recognition (ASR) limitations • Domain-independent knowledge management • Training Agent Design • Conversational input with robustness to ASR and adaptable knowledge base
Approach Build a dialog manager that: Handles ASR limitations Manages domain-independent knowledge Provides open dialog CONtext-driven Corpus-based Utterance Robustness (CONCUR) Input Processor Knowledge Manager Discourse Model Dialog Manager I/O Discourse Model Knowledge Manager Input Processor User Input Agent Response
CONCUR • Input Processor • Pre-process knowledge corpus via keyphrasing • Break down user utterance Corpus Input Processor Keyphrase Extractor WordNet User Utterance NLP Toolkit • Knowledge Manager • 3 data bases • Encyclopedia-entry style corpus • Context-driven Data
CONCUR CxBR Discourse Model Goal Bookkeeper Goal Stack (Branting et al, 2004) Inference Engine Context Topology Agent Goals User Goals
Evaluation • Plagued by subjectivity • Gathering of both objective and subjective metrics • Qualitative and quantitative metrics: • Efficiency metrics • Total elapsed time • Number of user turns • Number of system turns • Total elapsed time per turn • Word-Error Rate (WER) • Quality metrics • Out-of-corpus misunderstandings • General misunderstandings • Errors • Total number of user goals • Total number of user goals fulfilled • Goal completion accuracy • Conversational accuracy • Survey data • Naturalness • Usefulness
Evaluation Instrument • Nine statements, judged on a 1-to-7 scale based on level of agreement • Naturalness • If I told someone the character in this tool was real they would believe me. • The character on the screen seemed smart. • I felt like I was having a conversation with a real person. • This did not feel like a real interaction with another person. • Usefulness • I would be more productive if I had this system in my place of work. • The tool provided me with the information I was looking for. • I found this to be a useful way to get information. • This tool made it harder to get information than talking to a person or using a website. • This does not seem like a reliable way to retrieve information from a database.
Data Acquisition • General data set acquisition procedure: • User asked to interact with agent • Natural, information-seeking • Voice recording • User asked to complete survey • Data analysis process: • Voice transcriptions, ASR transcripts, internal data, and surveys analyzed
Data Acquisition LifeLike Avatar CONCUR Chatbot Speech Recognizer CONCUR Dialog Manager Jabber-based Agent CONCUR Dialog Manager User Voice Mic User Text Input ASR String Keyboard ECA Agent Externals Agent Voice Speaker Response String Agent Text Output Monitor Chatbot Agent Image Monitor
Survey Baseline • Question 1: What are the expectations of naturalness and usefulness for the conversation agents in this study? • Question 2: How differently did users rate the AlexDSS Avatar with the CONCUR Avatar? • 1. Both LifeLike Avatars established user assessments that exceeded other ECA efforts • 2. Both avatar-based systems in the speech-based data sets established similar scores in Naturalness and Usefulness
Survey Baseline • Question 3: How differently did users rate the ECA systems with the chatbot system? • 3. ECA-based systems were judged similarly, both better than chatbot
ASR Resilience • Question 1: Can a speech-based CONCUR Avatar’s goal completion accuracy measure up to the AlexDSS Avatar under a high WER? • 1. A Speech-based CONCUR Avatar’s goal completion accuracy measures up to AlexDSS avatar with similarly high WER
ASR Resilience • Question 2: How does improving WER affect CONCUR’s goal completion accuracy? • 2. Improved WER does not increase CONCUR’s goal completion accuracy because no new user goals were identified or corrected with the better recognition
ASR Resilience • Question 3: Can CONCUR’s goal completion accuracy measure up to other conversation agents in lieu of high WER? • 3: CONCUR’s goal completion accuracy is similar to that of the Digital Kyoto system, with twice the WER.
ASR Resilience • Question 4: Can a speech-based CONCUR Avatar’s conversational accuracy measure up to the AlexDSS avatar under a high WER? • 4. Speech-based CONCUR’s conversational accuracy does not measure up to an AlexDSS Avatar with similarly high WER. This can be attributed to general misunderstandings and errors caused by misheard user requests or specific question answering requests not common with menu-driven discourse models
ASR Resilience • Question 5: How does improving WER affect CONCUR’s conversational accuracy? • 5. Improved WER increases CONCUR’s conversational accuracy by decreasing general misunderstandings
ASR Resilience • Question 6: Can CONCUR’s conversational accuracy measure up to other conversation agents in lieu of high WER? • 6: CONCUR’s conversational accuracy surpasses that of the TARA system, which is text-based.
Domain-Independence • Question 1: Can CONCUR maintain goal completion accuracy after changing to a less specific domain corpus? • 1. CONCUR’s goal completion accuracy does not remain consistent after a change to a generalized domain corpus. Changing domain expertise may increase out-of-corpus requests, which decreases goal completion
Domain-Independence • Question 2: Can CONCUR maintain conversational accuracy after changing to a less specific domain corpus? • 2. After changing to a general domain corpus, CONCUR is capable of maintaining its conversational accuracy
Domain-Independence • 3. CONCUR’s Knowledge Manager enables a shortened knowledge development turnover time as compared to other conversation agent knowledge management systems • Question 3: Can CONCUR provide a quick method of providing agent knowledge?
Conclusions • Building Training Agents • Agent Design • ECA preference over Chatbot format • ASR • ASR improvements leads to better conversation-level processing • High ASR not necessarily an obstacle for ECA design • Knowledge Management • Tailoring domain expertise for an intended audience is more effective than a generalized corpus • Separation of domain knowledge from agent discourse helps to maintain conversational accuracy and speed up agent development times