310 likes | 325 Views
This research seminar explores the challenges in developing robust and natural spoken language dialog systems. Topics include automatic speech recognition, natural language processing, and human-computer interaction.
E N D
Research Challenges for Spoken Language Dialog Systems Julie Baca, Ph.D. Assistant Research Professor Center for Advanced Vehicular Systems Mississippi State University Computer Science Graduate Seminar March 3, 2004
Overview • Define dialog systems • Describe research issues • Present current work • Give conclusions and discuss future work
What is a Dialog System? • Current commercial voice products require adherence to “command and control” language, e.g., • User: “Plan Route” • Such interfaces are not robust to variations from the fixed words and phrases.
What is a Dialog System? • Dialog systems seek to provide a natural conversational interaction between the user and the computer system, e.g., • User: “Is there a way I can get to Canal Street from here?
Domains for Dialog Systems • Travel reservation • Weather forecasting • In-vehicle driver assistance • Call routing • On-line learning environments
Dialog Systems: Information Flow • Must model two-way flow of information • User-to-system • System-to-user
Dialog Manager Dialog System NLP Speech Recognition Application Database TTS Response Generation
Research Issues Many fundamental problems must be solved for these systems to mature. Three general areas include: • Automatic Speech Recognition (ASR) • Natural Language Processing (NLP) • Human-computer Interaction (HCI)
NLP Issue for Dialog Systems: Semantics • Must assess meaning, not just syntactic correctness. • Therefore, must handle ungrammatical inputs, e.g., • “Is there a ……where is..…a gas station nearby… …?”
NLP Semantics • [find] • (*WHERE [arrive_loc]) • WHERE • (where *[be_verb]) • [be_verb] • (is)(are)(were) • [arriveloc] • [*[prep] [placename] *[prep]] • [placename] • (gas station,hotel,restaurant) • [prep] • (near, nearest, closest, nearby) • Employ semantic grammar consisting of case frames with named slots. • FRAME: • [find] • [drive]
NLP Issue: Semantic Representation • Two Approaches: • Hand-craft the grammar for the application, using robust parsing to understand meaning [1,2]. • Problem: time, expense • Use statistical approach, generating initial rules and using annotated tree-banked data to discover the full rule set [3,4]. • Problem: annotated training data
NLP Issue: Resolving Meaning Using Context • Must maintain knowledge of the conversational context. • After request for nearest gas station, user says, “What is it close to?” • Resolving “it” - anaphora • Another follow-up by the user, “How about …restaurant?” • Resolving “…” with “nearest”- ellipsis
Resolving Meaning: Discourse Analysis • To resolve such requests, system must track context of the conversation. • This is typically handled by a discourse analysis component in the Dialog Manager.
Dialog System Discourse Analysis NLP Speech Recognition Dialog Manager Application Database Response Generation TTS
Dialog Manager: Discourse Analysis • Anaphora resolution approach: Use focus mechanism, assuming conversation has focus [5]. • For our example, “gas station” is current focus. • But how about: • “I’m at Food Max. How do I get to a gas station close to it and a video store close to it?” • Problem: Resolving the two “its”.
Dialog Manager: Clarification • Often cannot satisfy request in one iteration. • The previous example may require clarification from the user, • “Do you want to go to the gas station first?”
HCI Issue:System vs. User Initiative • What level of control do you provide user in the conversation? Initiative Computer Human C: "Please say departure city" U:"Tell me how to get to the Hilton."
Mixed Initiative • Total system initiative provides low usability. • Total user initiative introduces higher error rate. • Thus, mixed initiative approach, balancing usability and error rate, is taken most often. • Allowing user to adapt the level explicitly has also shown merit [6].
HCI Issue: Evaluating Dialog Systems • How to compare and evaluate dialog systems? • PARADISE (Paradigm for Dialog Systems Evaluation) has provided a standard framework [7].
PARADISE: Evaluating Dialog Systems • Task success • Was the necessary information exchanged? • Efficiency/Cost • Number dialog turns, task completion time • Qualitative • ASR rejections, timeouts, helps • Usability • User satisfaction with ASR, task ease, interaction pace, system response
Current Work • Sponsored by CAVS • Examining: • In-vehicle environment • Manufacturing environment • Online learning environment • Multidisciplinary Team: • CS (Baca), ECE (Picone) • ECE graduate students • Hualin Gao, Theban Stanley • CPE UG • Patrick McNally
Current Work: In-vehicle Dialog System • Approach • Developed prototype in-vehicle system. • Allows querying for information in Starkville/MSU area.
System Architecture DIALOG MANAGER • Example frames and associated queries: Drive_Direction: “How can I get from Lee Boulevard to Kroger?” Drive_Address: “Where is the campus bakery?” Drive_Distance: “How far is China Garden?” Drive_Quality: “Find me the most scenic route to Scott Field.” Drive_Turn: “I am on Nash Street. What’s my next turn?”
Application Development GIS Backend • Geographic Information System (GIS) contains map routing data for MSU and surrounding area. • Dialog manager (DM) first determines the nature of query, then: • obtains route data from the GIS database • handles presentation of the data to the user
Application Development Pilot System • Obtained domain-specific data by: • Initial data gathering and system testing • Retesting after enhancing LM and semantic grammar • Initial efforts focused on reducing OOV utterances and parsing errors for NLU module.
In-Vehicle Dialog System • Established a preliminary dialog system for future data collection and research • Demonstrated significant domain-specific improvements for in-vehicle dialog systems. • Created a testbed for future studies of workforce training applications.
Workforce Training • Significant issues in manufacturing environment: • Recognition issues: • Real-time performance • Noisy environments • Understanding issues: • Multimodal interface for reducing error rate, e.g., voice and tactile. • HCI/Human Factors Issues: • Response generation to integrate speech and visual output
Online Learning • Significant issues in online learning environment: • Understanding issues: • Understanding learner preferences and habits. • HCI/Human Factors Issues: • Response generation to accommodate learning style. • Evaluation.
Research Significance • Advance the development of dialog systems technology through addressing fundamental issues as they arise in various domains. • Potential areas: ASR, NLP, HCI
References [1] S.J. Young and C.E. Proctor, “The design and implementation of dialogue control in voice operated database inquiry systems,” Computer Speech and Language, Vol.3, no. 4, pp. 329-353, 1992. [2] W. Ward, “Understanding spontaneous speech,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 1991, pp. 365-368. [3] R. Pieraccini and E. Levin, “Stochastic representation of semantic structure for speech understanding,” Speech Communication, vol. 11., no.2, pp. 283-288, 1992. [4] Y. Wang and A. Acero, “Evaluation of spoken grammar learning in the ATIS domain,” in Proceedings International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, 2002. [5] C. Sidner, “Focusing in the comprehension of definite anaphora,” in Computational Model of Discourse, M. Brady, Berwick, R., eds, 1983, Cambridge, MA, pp. 267-330, The MIT Press. [6] D. Littman and S. Pan, “Empirically evaluating an adaptable spoken language dialog system,” in The Proceedings of International Conference on User Modeling, UM ’99, Banff, Canada, 1999.
References [7] M. Walker, et al., “PARADISE: A Framework for Evaluating Spoken Dialogue Agents, “ Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97), pp. 271-289, 1997.