1 / 12

Presented by: Mark Hepple

DQA meeting: 17.07.2007: Learning more effective dialogue strategies using limited dialogue move features Matthew Frampton & Oliver Lemon, Coling/ACL-2006. Presented by: Mark Hepple. Data-driven methodology for SDS Development .

morse
Download Presentation

Presented by: Mark Hepple

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DQA meeting: 17.07.2007:Learning more effective dialogue strategies using limited dialogue move featuresMatthew Frampton & Oliver Lemon,Coling/ACL-2006 Presented by: Mark Hepple

  2. Data-driven methodology for SDS Development • Broader context = realising a “data-driven” methodology for creating SDSs, with following steps: • 1. Collect data (using prototype or WOZ) • 2. Build probabilistic user simulation from data (covering user behaviour, ASR errors) • 3. [Feature selection - using USimulation] • 4. Learn dialog strategy, using reinforcement learning over system interactions with simulation

  3. Task • Information seeking dialog systems • Specifically task-oriented, slot-filling dialogs, leading to database query • E.g. getting user requirements for a flight booking (c.f. COMMUNICATOR task) • Aim is to achieve effective system strategy for such dialog interactions

  4. Reinforcement Learning • System modelled as a Markov Decision Process (MDP) • model decision making in situations where outcomes partly random, partly under system control • Reinforcement learning used to learn an effective policy • determines best action to take in each situation • Aim is to maximize overall reward • need a reward function, assigning reward value of different dialogs

  5. Action set of dialog system • 1. Ask open qu. (how may I help you?) • 2. Ask value for slot 1..n • 3. Explicitly confirm a slot 1..n • 4. Ask for slot k, whilst implicitly confirm slot k-1 or k+1 • 5. Give help • 6. Pass to human operator • 7. Database query

  6. Reward function • Reward function is “all-or-nothing”: • 1. DB query, all slots confirmed = +100 • 2. Any other DB query = -75 • 3. Usimulation hangs up = -100 • 4. System passes to human operator = -50 • 5. Each system turn = -5

  7. N-Gram User Simulation • Employs n-gram user simulation of Georgila, Lemon & Henderson: • Derived from annotated version of COMMUNICATOR data • Treat dialog as sequence of pairs of DAs/tasks • Output next user “utterance” as DA/task pair, based on last n-1 pairs • Incorporate effects of ASR errors (built from user utts as recognised by ASR components of original COMMUNICATOR systems) • Have separate 4- and 5-gram simulations: used for training/testing

  8. Key Question: what context features to use • Past work - has used on limited state information • Based on number/fill-status of slots • Proposal: include richer context information, specifically • Dialog act (DA) of last system turn • DA of last user turn

  9. Experiments • Compare 3 systems • Baseline: slot features only • Strat 2: slot features + last user DA • Strat 3: slot features + last user + system Das • Train with 4-gram, test with 5-gram USim, and vice versa

  10. Results • Main reported result is improvement in average reward level of dialogs for strategies, compared to baseline • Str-2 improves over baseline by 4.9% • Str-3 improves over baseline by 7.8% • All 3 strategies achieve 100% slot filling and confirmation • Augmented strategies also improve over baseline w.r.t. average dialogue length

  11. Qualitative Analysis • Learns to: • Only query DB when all slots filled • Not pass to operator • Use implicit confirmation where possible • Emergent behaviour: • When baseline system fails to fill/confirm slot from user input, state remains same, and system will repeat same action • For augmented systems, ‘state’ changes, so can learn to do different action, e.g. ask about a different slot, or use “give help” action

  12. Questions/Comments • Value of performance improvement figures based on reward? • Does improvement w.r.t. reward function -> improvement for human-machine dialogs • Validity of comparisons to COMMr systems • Why does system performance improve? • Is avoidance of repetition the key?

More Related