210 likes | 222 Views
Explore the development of a corpus-driven stochastic language generation engine for dialog systems, combining advantages of different approaches, aiming to solve complex travel reservation tasks efficiently.
E N D
Stochastic Language Generation for Spoken Dialog Systems Alice Oh aliceo@cs.cmu.edu Kevin Lenzo lenzo@cs.cmu.edu Alex I. Rudnicky air@cs.cmu.edu School of Computer Science Carnegie Mellon University
Communicator Project A spoken dialog system in which users engage in a telephone conversation with the system using natural language to solve a complex travel reservation task Components • Sphinx-II speech recognizer • Phoenix semantic parser • Domain agents • Agenda-based dialog manager • Stochastic natural language generator • Festival domain-dependent Text-to-Speech (being integrated) Want to know more? Call toll-free at 1-877-CMU-PLAN Speech Group
Problem Statement • Problem: build a generation engine for a dialog system that can combine the advantages, as well as overcome the difficulties, of the two dominant approaches (template-based generation, and grammar rule-based NLG) • Our Approach: design a corpus-driven stochastic generation engine that takes advantage of the characteristics of task-oriented conversational systems. Some of those characteristics are that • Spoken utterances are much shorter in length • There are well-defined subtopics within the task, so the language can be selectively modeled Speech Group
Stochastic NLG: overview • Language Model: an n-gram language model of domain expert’s language built from a corpus of travel reservation dialogs • Generation: given an utterance class, randomly generates a set of candidate utterances based on the LM distributions • Scoring: based on a set of heuristics, scores the candidates and picks the best one • Slot filling: substitute slots in the utterance with the appropriate values in the input frame Speech Group
Stochastic NLG: overview Language Models Generation Dialog Manager Candidate Utterances What time on {depart_date}? At what time would you be leaving {depart_city}? Input Frame { act query content depart_time depart_date 20000501 } Tagged Corpora Scoring Best Utterance What time on {depart_date}? Complete Utterance What time on Mon, May 8th? TTS Slot Filling Speech Group
Stochastic NLG: Corpora Human-Human dialogs in travel reservations (CMU-Leah, SRI-ATIS/American Express dialogs) Speech Group
Example Utterances in Corpus: What time do you want to depart {depart_city}? What time on {depart_date} would you like to depart? What time would you like to leave? What time do you want to depart on {depart_date}? Output (different from corpus): What time would you like to depart? What time on {depart_date} would you like to depart {depart_city}? *What time on {depart_date} would you like to depart on {depart_date}? Speech Group
Evaluation Transcription Dialogs with OutputS Dialogs Stochastic NLG Dialogs with OutputT Template NLG Batch-mode Generation Comparative Evaluation Speech Group
Preliminary Evaluation • Batch-mode generation using two systems, comparative evaluation of output by human subjects User Preferences (49 utterances total) • Weak preference for Stochastic NLG (p = 0.18) subject stochastic templates difference 1 41 8 33 2 34 15 19 3 17 32 -15 4 32 17 15 5 30 17 13 6 27 19 8 7 8 41 -33 average 27 21.29 5.71 Speech Group
Stochastic NLG: Advantages • corpus-driven • easy to build (minimal knowledge engineering) • fast prototyping • minimal input (speech act, slot values) • natural output • leverages data-collecting/tagging effort Speech Group
Open Issues • How big of a corpus do we need? • How much of it needs manual tagging? • How does the n in n-gram affect the output? • What happens to output when two different human speakers are modeled in one model? • Can we replace “scoring” with a search algorithm? Speech Group
Current Approaches • Traditional (rule-based) NLG • hand-crafted generation grammar rules and other knowledge • input: a very richly specified set of semantic and syntactic features • Example* (h / |possible<latent| :domain (h2 / |obligatory<necessary| :domain (e / |eat,take in| :agent you :patient (c / |poulet|)))) • You may have to eat chicken • Template-based NLG • simple to build • input: a dialog act, and/or a set of slot-value pairs * from a Nitrogen demo website, http://www.isi.edu/natural-language/projects/nitrogen/ Speech Group
If you set n equal to a large enough number, most utterances generated by LM-NLG will be exact duplicates of the utterances in the corpus. Stochastic NLG can also be thought of as a way to automatically build templates from a corpus Speech Group
Tagging • CMU corpus tagged manually • SRI corpus tagged semi-automatically using trigram language models built from CMU corpus Speech Group
Utterance classes (29) query_arrive_city inform_airport query_arrive_time inform_confirm_utterance query_arrive_time inform_epilogue query_confirm inform_flight query_depart_date inform_flight_another query_depart_time inform_flight_earlier query_pay_by_card inform_flight_earliest query_preferred_airport inform_flight_later query_return_date inform_flight_latest query_return_time inform_not_avail hotel_car_info inform_num_flights hotel_hotel_chain inform_price hotel_hotel_info other hotel_need_car hotel_need_hotel hotel_where Attributes (24) airline flight_num am hotel arrive_airport hotel_city arrive_city hotel_price arrive_date name arrive_time num_flights car_company pm car_price price connect_airline connect_airport connect_city depart_airport depart_city depart_date depart_time depart_tod Tags Speech Group
Stochastic NLG: Generation • Given an utterance class, randomly generates a set of candidate utterances based on the LM distributions • Generation stops when an utterance has penalty score of 0 or the maximum number of iterations (50) has been reached • Average generation time: 75 msec for Communicator dialogs Speech Group
Stochastic NLG: Scoring • Assign various penalty scores for • unusual length of utterance (thresholds for too-long and too-short) • slot in the generated utterance with an invalid (or no) value in the input frame • a “new” and “required” attribute in the input frame that’s missing from the generated utterance • repeated slots in the generated utterance • Pick the utterance with the lowest penalty (or stop generating at an utterance with 0 penalty) Speech Group
Stochastic NLG: Slot Filling • Substitute slots in the utterance with the appropriate values in the input frame Example: What time do you need to arrive in {arrive_city}? What time do you need to arrive in New York? Speech Group
Stochastic NLG: Shortcomings • What might sound natural (imperfect grammar, intentional omission of words, etc.) for a human speaker may sound awkward (or wrong) for the system. • It is difficult to define utterance boundaries and utterance classes. Some utterances in the corpus may be a conjunction of more than one utterance class. • Factors other than the utterance class may affect the words (e.g., discourse history). • Some sophistication built into traditional NLG engines is not available (e.g., aggregation, anaphorization). Speech Group
Evaluation • Must be able to evaluate generation independent of the rest of the dialog system • Comparative evaluation using dialog transcripts • need more subjects • 8-10 dialogs; system output generated batch-mode by two different engines • Evaluation of human travel agent utterances • Do users rate them well? • Is it good enough to model human utterances? Speech Group