220 likes | 482 Views
Extracting Information from Spoken User Input A Machine Learning Approach. Piroska Lendvai Tilburg University, Netherlands. Outline. The community: ML as a tool to detect DAs, semantic content, communication problems in dialogue Tilburg University, Induction of Linguistic Knowledge group:
E N D
Extracting Information from Spoken User InputA Machine Learning Approach Piroska Lendvai Tilburg University, Netherlands
Outline • The community: ML as a tool to detect DAs, semantic content, communication problems in dialogue • Tilburg University, Induction of Linguistic Knowledge group: • memory-based learning software package TiMBL • joint work with Antal van den Bosch & Emiel Krahmer 2001-2004 • TiMBL applied to Dutch human-machine dialogue data • Extract pragmatic and semantic components from spoken user input • Drawing on simple, potentially erroneous info from SDS • Components better classified individually or combined?
speech recogniser I would like to travel from Amsterdam to Tilburg next Tuesday language understanding dialogue manager database answer generation From where to where would you like to travel on Tuesday twelve December? speech synthesis
OVIS corpus example • S1: Good evening. From which station to which station do you want to travel? • U1: I need to go from Amsterdam to Tilburg on Tuesday next week. • S2: From where to where do you want to travel on Tuesday twelve December? • U2: From Amsterdam to Tilburg. • S3: At what time do you want to travel from Amsterdam to Tilburg? • U3: Around quarter past eleven in the evening. • (…) • S5: I have found the following connections: (…). Do you want me to repeat the connection? • U5: Please do. • …
The OVIS system • Developed in 1995-2000, Dutch national project, train travel information • Slots to fill: DepartStation, ArriveStation, Dep/ArrivDay, Dep/ArrivTime • Dialogue structure: possibility to provide unsolicited info • System-initiative • System always verifies received info • Explicitly (“So you want to leave on Thursday.”) or • Implicitly (“At what time do you want to leave on Thursday?”) • 80 test users, noisy real-data corpus from different system versions • 441 full dialogues: 3738 turn pairs of system prompt & user reply • 43% of user turns inaccurately grounded by system • 8-26% WER
Goal:Partial interpretation of user input • Drawing on attributes available from system’s modules, extract informationfrom user’s input turn • task-related dialogue act (TRA) supercategories for info-seeking dialogues (8 application-motivated classes) • query slot types being filled (30) • current user turn originates future communication problems? (binary) • in current turn, user is already aware of communication problems? (binary) • Facilitate full understanding • ASR to have more confidence in accepting/rejecting recognition hypothesis recognition of ‘yes’ and ‘no’ is highly erroneous; TRAs: Affirm, Negat • DM to launch error recovery
Partial interpretation components • „From Amsterdam to Tilburg on Tuesday next week.” • „Not this Tuesday but next Tuesday.” • Task-related dialogue act:Slot-filling / Negation • (others:Affirmative, AcceptSysError, NonStd) • Slot(s) being filled by user: DepartStat, ArriveStat, ArriveDay / ArriveDay • (always co-occur with Slot-filling TRA) • Problem origin turn? Yes / No • predicting miscommunication • Problem awareness turn? No / Yes • detecting miscommunication
Annotated user turns • S1: Good evening. From which station to which station do you want to travel? • U1: I need to go from Amsterdam to Tilburg on Tuesday next week. • SlotFill , DepSt_ArrSt_ArrDay ,Prob ,Ok • S2: From where to where do you want to travel on Tuesday twelve December? • U2: From Amsterdam to Tilburg. • SlotFill ,DepSt_ArrSt,Ok ,Prob • S3: At what time do you want to travel from Amsterdam to Tilburg? • U3: Around quarter past eleven in the evening. • SlotFill ,ArrTimeofDay_ArrHour,Ok ,Ok • S5: I have found the following connections: (…). Do you want me to repeat the connection? • U5: Please do. • Affirm,void,Ok, Ok
Speech recognition • utterancespeech recogniser (ASR) word graph lattice paths • ASR strategy: pick best path (confidence, language model) • There can be distortions and deletions in a path • ‘from amsterdam to tilburg 傽tuesday’ • ‘from amsterdam to tilburg nexttuesday’ • ‘toamsterdam to tilburg nexttuesday’ • Wrong pick = possibly severe information loss from amsterdam /2473 next /120.5 tuesday /297 to tilburg /158 to amsterdam /549 tuesday/3481.5
Informativity of ASR output • ASR hypothesis lattice paths with associated confidences • User input: ‘rond vier uur’; around four o’clock • 856.109985 [1] <n> [2] #PAUSE# [4] op [7] drie [12] uur [13] #PAUSE# [14] on three o’clock 855.930054 [1] <n> [2] #PAUSE# [4] om [5] tien [12] uur [13] #PAUSE# [14] 855.430054 [1] <n> [2] #PAUSE# [4] om [5] drie [12] uur [13] #PAUSE# [14] 855.330017 [1] <n> [2] #PAUSE# [3] half [7] drie [12] uur [13] #PAUSE# [14] 855.140015 [1] <n> [2] #PAUSE# [3] half [8] #PAUSE# [9] vier [10] uur [13] #PAUSE# half four o’clock 855.109985 [1] <n> [2] #PAUSE# [3] half [8] tien [10] uur [13] #PAUSE# [14] 854.890015 [1] <n> [2] #PAUSE# [3] half [8] #PAUSE# [9] vier [11] uur [13] #PAUSE# half four o’clock 854.880005 [1] <n> [2] #PAUSE# [4] om [5] dertien [10] uur [13] #PAUSE# [14] 854.470032 [1] <n> [2] #PAUSE# [4] om [6] drie [10] uur [13] #PAUSE# [14] 853.880005 [1] <n> [2] #PAUSE# [4] om [5] drie [10] uur [13] #PAUSE# [14] at three o’clock ...... 853.869995 [1] <n> [2] #PAUSE# [4] rond [9] vier [11] uur [13] #PAUSE# [14] around four o’clock
Utilise full ASR hypotheses: Bag-of-Words • BoW representation successful in information retrieval, may work for SLU • ignores information on order, frequency, probability of recognised words • robustly characterises utterance • Isolated paths may partially or incorrectly contain the uttered words, full lattice may contain them all (plus incorrect ones) • entire ASR hypothesis lattice is encoded as binary vector that indicates for all words in the lexicon if those were hypothesised by the ASR. • 759 bits, active 7.5 avg • vector bits: …around, at, four, o’clock, on, ten, three, tonight, travel…. • binary BoW: …1, 1, 1, 1, 1, 1, 0, 0…
Machine learning experiments • Task Given a user turn in its preceding dialogue context, • assign one partial interpretation tag to it • Method Interpretation components alone/composed? • Experimentally search optimal subtask combinations • Understanding levels possibly interact with eao • [TaskRelAct + Slots + ProbOrig + ProbAwar], up to 148 concatenated classes from combined components • Tool Memory-based learner (MBL) • lazy learning, examples are stored in memory • classification: class is extrapolated from k-nearest neighbour • distance from neighbours is sum of (weighted) feature differences • parameter settings optimised with search heuristics (Van den Bosch, 2004)
Knowledge sources for MBL • From User: Speech signal measurements • Prosody: turnduration, pitch F0 min/max/avg/stdev, energy RMS max/avg/stdev, tempo syll/sec, turn-initial pause • ASR output: confidence scores, best string, word graph branching, Bag-of-Words • From System: Dialogue context • Prompt wording as BoW • Prompt type history: 10 system prompts represented as structured symbols • “From where to where do you want to travel on Tuesday twelve December?” >> • _, _, _, _, _, _,Q-DepArr, RepQ-DepArr, ImplVerDep;Q-Arr, ImplVerArr;Q-Day
Informed baseline • Always predict class that most frequently occurs after the current prompt type • Users are highly cooperative with system prompts • Predicting ProbOrig is tough, depends on other factors
Results, F=1 scores • Optimised subtask combinations • Improvement over baseline and fully combined subtasks • Class label design has significant impact on learner performance
Implications of optimal task combinations • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask • DAs coordinate distribution of other dialogue components • Correct detection of TRA facilitates identifying semantic phenomena • Instead of pipeline architecture, combined prag-sem processing beneficial
Implications of optimal task combinations • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask • DAs coordinate distribution of other dialogue components • Correct detection of TRA facilitates identifying semantic phenomena • Instead of pipeline architecture, combined prag-sem processing beneficial • Best to simultaneously detect ProbAwareness and TRAs • Signalling ProbAwareness has properties similar to DA cues • It is a feedback act, might be suboptimal to treat it in an isolated problem detection module
Implications of optimal task combinations • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask • DAs coordinate distribution of other dialogue components • Correct detection of TRA facilitates identifying semantic phenomena • Instead of pipeline architecture, combined prag-sem processing beneficial • Best to simultaneously detect ProbAwareness and TRAs • Signalling ProbAwareness has properties similar to DA cues • It is a feedback act, might be suboptimal to treat it in an isolated problem detection module • ProbOrigin task is an outlier, no correlations with prag-sem phenomena • Predicting ProbOrigin requires other sources of info, roots in system-internal technical factors
Other aspects investigated: algorithm • Applied rule learner(RIPPER):eager learning strategy • different outcome of task compositionality: optimally learns isolated subtasks • same magnitude of learning performance on all 4 components
Other aspects investigated: features • Investigated contribution of feature type groups per subtask • dialogue history informative • prosody provides suboptimal cues • using the full word graph creates robustness: minor effect on PI when • Automatically filtering word graphfrom disfluencies, unfrequent words, less informative words • Simulating perfect ASR by encoding the transcribed user utterances as BoW
Upper bound performance F • Encoding the full hypothesis lattice is cheap and produces classification scores close to those on perfectly recognised words • BOW treats noise well:incomplete, ungrammatical, redundant, erroneous info
Extensions • Robust Language Understanding for Question Answering in Dialogues project in Tilburg aims to validate approach in Dutch QA system for medical domain • adapt extended DA tagset • use word-level prosody instead of turn-level • exploit syntactic features • incorporate attributes of windowedleft context • Advanced methods for efficient context treatment: sequence learning • find begin/end word boundaries of DA tag • find boundaries of slot values • identify and expect adjacent turn pair DA sequences