1 / 45

Emergence of Gricean Maxims from Multi-agent Decision Theory

Emergence of Gricean Maxims from Multi-agent Decision Theory. Adam Vogel Stanford NLP Group Joint work with Max Bodoia , Chris Potts, and Dan Jurafsky. Decision-Theoretic Pragmatics. Gricean cooperative principle:.

colum
Download Presentation

Emergence of Gricean Maxims from Multi-agent Decision Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emergence of Gricean Maxims from Multi-agent Decision Theory Adam Vogel Stanford NLP Group Joint work with Max Bodoia, Chris Potts, and Dan Jurafsky

  2. Decision-Theoretic Pragmatics Gricean cooperative principle: Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.

  3. Decision-Theoretic Pragmatics Gricean Maxims: • Be truthful: speak with evidence • Be relevant: speak in accordance with goals • Be clear: be brief and avoid ambiguity • Be informative: say exactly as much as needed

  4. Emergence of Gricean Maxims • Be truthful • Be relevant • Be clear • Be informative Rationality ??? Co-operative principle Joint utility Approach: Operationalize the co-operative principle Tool: Multi-agent decision theory Goal: Maxims emerge from rational behavior

  5. Related Work • One-shot reference tasks • Generating spatial referring expressions [Golland et al. 2010] • Predicting pragmatic reasoning in language games [Stiller et al. 2011] • Interpreting natural language instructions • Learning to read help guides [Branavan et al. 2009] • Learning to following navigational directions [Vogel and Jurafsky 2010] [Artzi and Zettlemoyer 2013] [Chen and Mooney 2011] [Tellex et al. 2011]

  6. CARDS Task

  7. Outline • Spatial semantics • ListenerBot: single-agent advice taker • Can accept advice, never gives it • DialogBot: multi-agent decision maker • Gives advice by tracking the other player’s beliefs

  8. Spatial Semantics “in the top left of the board” “on the left side” “right in the middle” MaxEnt Classifier w/ Bag of Words BOARD(top;left) BOARD(left) BOARD(middle) Estimated from Corpus Data

  9. Complexity Ahoy • Approximate decision making only feasible for problems with <10k states!

  10. Semantic State Representation • Divide board into 16 regions • Cluster squares based on meanings

  11. Outline • Spatial semantics • ListenerBot: single-agent advice taker • Can accept advice, never gives it • DialogBot: multi-agent decision maker • Gives advice by tracking the other player’s beliefs

  12. Partially Observable Markov Decision Process (POMDP) Or: An HMM you get to drive!

  13. State space S: hidden configuration of the world • Location of card • Location of player

  14. Action space A: what we can do • Move around the board • Search for the card

  15. Observations : sensor information + messages • Whether we are on top of the card • BOARD(right;top) etc.

  16. Observation Model : sensor model • We see the card if we search for it and are on it • For messages

  17. Reward R(s,a): value of an action in a state • Large reward if in the same square as the card • Every action adds small negative reward

  18. Transition T(s’|a,s): dynamics of the world • Travel actions change player location • Card never moves

  19. Initial belief state : distribution over S • Uniform distribution over card location • Known initial player location

  20. Belief Update: Action: SEARCH Observation: (Card not here, )

  21. Belief Update:

  22. Belief Update: Action: SEARCH Observation: (Card not here, “left side”)

  23. Belief Update:

  24. Decision Making • Choose policy • Goal: Maximize expected reward • Solution: Perseus, an approximate value iteration algorithm [Spaan et al. 2005] • Computational complexity: P-SPACE! + Expected Immediate reward Future reward

  25. Outline • Spatial semantics • ListenerBot: single-agent advice taker • Can accept advice, never gives it • DialogBot: multi-agent decision maker • Gives advice by tracking the other player’s beliefs

  26. DialogBot • (Approximately) tracks beliefs of other player • Speech actions change beliefs of other player • Model: Decentralized POMDP (Dec-POMDP) • Problem: NEXP Hard!! Top!

  27. Each agent selects its own action

  28. Each agent receives its own observation

  29. Transition depends on both actions

  30. Formalization of the co-operative principle Reward is shared between agents

  31. Exact Multi-agent Belief Update Time

  32. Approximate Multi-agent Belief Update Time

  33. Single-agent POMDP Approximation Other agent belief transition model World transition model Resulting POMDP has states

  34. What to say?

  35. “Top”

  36. “Middle”

  37. “Right”

  38. “Right”

  39. Return to Grice • Be truthful • Be relevant • Be clear • Be informative

  40. Cooperating DialogBots Middle of the board

  41. Cooperating DialogBots Middle of the board

  42. Adolescent DialogBots Top

  43. Return to Grice • Be truthful: DialogBot speaks with evidence • Be relevant: DialogBot gives advice to help win the game • Be clear • Be informative

  44. Experimental Results • Evaluate pairs of agents from 197 random initial states • Agents have 50 high-level moves to find the card

  45. Emergent Gricean Behavior From joint reward, not hard coded • Be truthful: DialogBot speaks with evidence • Be relevant: DialogBot gives advice to help win • Be clear: need variable costs on messages • Be informative: requires levels of specificity ACL 2013: Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs Future Work: intentions, joint plans, deeper belief nesting Thanks!

More Related