230 likes | 462 Views
Machine Learning of Discourse. Piroska Lendvai ILK Research Group Tilburg University. Overview: Spoken Dialogue Systems: goal, architecture, need for error handling Machine Learning for SDS: what, where, how
E N D
Machine Learning of Discourse Piroska Lendvai ILK Research Group Tilburg University
Overview: • Spoken Dialogue Systems: goal, architecture, need for error handling • Machine Learning for SDS: what, where, how • Machine Learning for discourse in SDS: an empirical study on shallow semantic-pragmatic data
I. Spoken Dialogue Systems • From 60s: communication with a machine in natural language. Recent advances in speech technology facilitate development of SDS • Create systems that enable interaction with an application (eg. software, TV, database) using natural language through a voice interface • SDS: programs that communicate with a user in order to perform some task (e.g. database query) -> "information systems“, “conversational agents”
Practical use • Save money with automated telephone interfaces that provide info (calling, transport, weather, booking, banking) • Create automated help systems/manuals • Implement in smart household appliances / industrial robots • (Provide support for customers in) using e- commerce • E-mail, voice mail access
Man-machine communication demands combining techniques of speech analysis and generation + linguistic analysis/generation (syntax, semantics) + task planning • Desirable to meet discourse requirements: satisfy social obligations, handle world knowledge and dynamically changing context • So: apart from understanding the literal meaning of the user’s utterance, must interpret subtle and implicit dialogue acts (does user request / provide / acknowledge info, correct misunderstanding, etc.)
Discourse operates on all linguistic levels from phonetics to pragmatics -> implementing discourse info in SDS implies formalization of all these levels (VERY hard) • Current SDS not very efficient, operate in very restricted domains • Examples of dialogue systems: • Demo SDS: GoDIS (written input, limited slot-fillers, rigid management) • Test version SDS: OVIS (more flexible slot filling, bad language understanding)
GoDis $S>Welcome to the travel agency! $U>I want to fly from Amsterdam to Budapest $S>What city do you want to go to? $U>Budapest $S>I didnt understand what you said. Please rephrase. $U>Gothenburg $S>What city do you want to go from? $U>Amsterdam $S>I didnt understand what you said. Please rephrase. …
OVIS Q: goedenavond. dit is het automatische informatiesysteem van openbaar Vervoer Reisinformatie. dit systeem geeft uitsluitend informatie over treinreizen van NS-station naar NS-station. van welk station naar welk station wilt u reizen? A: ik moet volgende week dinsdag van schiphol naar nijmegen Q: van waar naar waar wilt u op dinsdag twaalf december reizen? A: van schiphol naar nijmegen Q: hoe laat wilt u vanuit schiphol naar nijmegen reizen? …
SDS architecture lang understanding speech recog dialogue manager application speech synthesis lang generation
Interface: speech (telephone); GUI (WAP/WWW protocol); animated agent (multimodal, 3D) • NLU unit: STT: ASRecogn (acoustic processing), Language model (grammar, lexicon, phrase segmentation), Semantic representation • Dialogue Manager: intermediate agent between user and application, responsible for the interaction. Built-in domain model, session model, user model (competence of user), task planner (carry out strategy, monitor state, update tasks / memory), problem solver, communicates with application / database • NLG: domain templates, TTS - speech synthesis
Pragmatic info in SDS • Conversation is one type of discourse • Special notions: turn-taking, info grounding • Dialogue acts (ask, answer, acknowledge, correct, verify, suggest, reject,…) expressed by word order, intonation, ellipsis, context, … • Represent communicative content of utterances as • Q “On which day do you want to travel?” • E “So you want to go to Rotterdam.” • Q;I “What time do you want to travel to Rotterdam?” • S “Tomorrow.” • N;S “No, not to Rotterdam but to Roermond.”
Semantic info in SDS We want to know what U says to S & vice versa • Simpler: want to know what slot Sys talks about with what value • Q_VA “From where to where do you want to travel?” • Q_DTH; I_A “When do you want to arrive in Rotterdam?” • Q_Oc “Do you want to know another connection?” • Want to know what slot User fills with what value • S_T “Somewhere in the evening” • S_DTH “On Monday morning at 10” • Y;S_VA “Yes, the return trip” • N “You don’ t need to repeat the connection”
Error handling in SDS • Central issue to keep interaction running. • Typical problems in SDS are different from those in human-human communication: humans comm. multimodally but ling-ly not necessarily explicitly, in system everything needs to be explicitly stated • Deal with problems due to: • NLU technicalities: poor speech recognition as ASR trained in lab, limited vocabulary and lang. model, user idiosyncrasies (accent, intonation, reaction time, comprehension) • Poor DM engineering yields bad dialogue management:
DM problems: • inefficient strategy wrt task completion (only 2 prompts to fill slot) • wrong default assumptions (user wants to travel today) • unintegrated social behaviour (user's greeting unrecognized) • unintegrated world knowledge (tweede kerstdag) If cues that signal problems (_prob) or seamless communication (_ok) are identified, we can perform error detection and recovery.
II. ML for SDS • Goal:train / optimize / adapt the SDS modules by processing corpus data of H-C / H-H interactions • Issues: Which ML algorithms for which tasks? combining features means processing continuous and symbolic and set-valued data, induce rules vs store instances, ... • Data: Reduced hypothesis space, often artificial / small data (WOZ, test users)
ML for NLU: • predict speech recognition performance • predict and adapt to poor speech recognition in SDS • identify understanding errors, user corrections • predict user reactions to system error (both prob and sem) • represent utterances’ semantic content • topic recognition
ML for NLG: • optimize sentence planner for generating system prompt • ML for DM: • predict problematic situations during the interaction • optimize general strategy • identify dialogue acts in utterances • enhance system design by modelling interactions (both prag and sem)
III. Empirical study of ML of discourse in man-machine dialogues • Noisy real-data Dutch corpus: OVIS • 3738 turns • Class: user utterance represented as tag incorporating shallow pragmatics + semantics + problem awareness [Sys: Q_DTH;I_A “When do you want to travel to Tilburg?”] • S_D_ok “Tomorrow” • N;S_A_prob “Not to Tilburg but to Schiphol!” • A;S_DTH_prob “Today at eight in the evening”
Attribute picking: which characteristics of the dialogue are most predictive of the problem/non-problem class? • Shallow features, automatically extractable from SDS • symbolic (last 10 sysQ types, “history”) • numeric (user prosody: pitch, loudness, duration, tempo phenomena) • binary (759 bag-of-words output of ASR) • lexical (best string of ASR)
Tasks • Check “baseline”: performance based on only most-predictive-feature (latest prompt) • Classifiy discourse (prag+sem+prob class) • Classifiy separately • Pragmatics and Semantics • Problem awareness