140 likes | 292 Views
Modeling Two-Player Games in the Sigma Graphical Cognitive Architecture David V. Pynadath, Paul S. Rosenbloom, Stacy C. Marsella and Lingshan Li 8.1.2013. Σ. Overall Desiderata for Sigma (
E N D
Modeling Two-Player Games in the Sigma Graphical Cognitive ArchitectureDavid V. Pynadath, Paul S. Rosenbloom, Stacy C. Marsella and Lingshan Li8.1.2013 Σ
Overall Desiderata for Sigma (𝚺) • A new breed of cognitive architecture that is • Grand unified • Cognitive + key non-cognitive (perception, motor, affective, …) • Functionally elegant • Broadly capable yet simple and theoretically elegant • “cognitive Newton’s laws” • Sufficiently efficient • Fast enough for anticipated applications • For virtual humans & intelligent agents/robots that are • Broadly, deeply and robustly cognitive • Interactive withtheir physical and social worlds • Adaptivegiven their interactions and experience Hybrid: Discrete + Continuous Mixed: Symbolic + Probabilistic
Sample ICT Virtual Humans Gunslinger Ada & Grace For education, training, interfaces, health, entertainment, … INOTS SASO
Theory of Mind (ToM) in Sigma • ToM models the minds of others, to enable for example: • Understanding multiagent situations • Participating in social interactions • ToM approach based on PsychSim (Marsella & Pynadath) • Decision theoretic problem solving based on POMDPs • Recursive agent modeling • Questions to be answered • Can Sigma elegantly extend to comparable ToM? • What are the benefits for ToM? • What new phenomena emerge from this combination? • Results reported here concern: • Multiagent Sigma • Implementation of single shot, two player games • Both simultaneous and sequential moves
The Structure of Sigma 𝚺 Cognitive System Computer System • Constructed in layers • In analogy to computer systems Programs & Services Knowledge & Skills Computer Architecture Cognitive Architecture Microcode Architecture Graphical Architecture Hardware Lisp Cognitive Arch: Predicates (WM) Conditionals (LTM) Perception Memory Access Decision Learning Action Graphical Architecture: Graphical models Piecewise linear functions Graph Solution Graph Modification Conditionals: Deep blending of rules and probabilistic networks Graphical models: Factor graphs + summary product algorithm
Control Structure: Soar-like Nesting of ThreeLayers • A reactive layer • One (internally parallel) graph/cognitive cycle Which acts as the inner loop for • A deliberative layer • Serial selection and application of operators Which acts as the inner loop for • A reflective layer • Recursive, impasse-driven, meta-level generation • The layers differ in • Time scales • Serial versus parallel • Controlled versus uncontrolled Tie No-Change
Single-Shot, Simultaneous-Move, Two-Player Games B • Two players move simultaneously • Played only once (not repeated) • So no need to look beyond current decision • Symmetric and asymmetric games • Socially preferred outcome: optimum in some sense • Nash equilibrium: Neither player can unilaterally increase their payoff by altering their own choice • Key result:Sigma found the best Nash equilibrium in one memory access (i.e., graph solution) • Although linear combination in article can’t always guarantee it A 602 Messages 962 Messages
Sequential Games • Players (A, B) alternate moves • E.g., Ultimatum, centipede and negotiation • Decision-theoretic approach with softmaxcombination • Use expected value at each level of search • Action Ps assumed exponential in their utilities (à la Boltzmann) • There may be many Nash equilibria • Instead seek stricter concept of subgame perfection • Overall strategy is an equilibrium strategy over any subgame • Key result:Games solvable in two modes: • Automatic/reactive/system-1 • Controlled/deliberate/system-2 Both modes well documented in humans for general processing Combination not found previously in ToM models
The Ultimatum Game • A starts with a fixed amount of money (3) • A decides how much (in 0-3) to offer B • B decides whether or not to accept the offer • If B accepts, each gets the resulting amount • If B rejects, both get 0 • Each has a utility function over money • E.g., <.1, .4, .7, 1>
Automatic/Reactive Approach • A trellis (factor) graph in LTM with one stage per move • Focus on backwards messages from reward(s) CONDITIONAL Transition-B Conditions: Money(agent:Bquantity:moneyb) Condacts: Accept(offer:offeracceptance:choice) Function(choice,offer,moneyb): 1<T,0,0>, 1<T,1,1>, 1<T,2,2>, 1<T,3,3>, 1<F,*,0> CONDITIONAL Reward Condacts: Money(agent:agentquantity:money) Function(agent,money): .1<*,0>, .4<*,1>, .7<*,2>, 1<*,3> reward offer TA accept TB money exp CONDITIONAL Transition-A Conditions: Money(agent:Aquantity:moneya) Accept-E(offer:offeracceptance:choice) Condacts: Offer(agent:Aquantity:offer) Function(choice,offer,moneya): 1<T,0,3>, 1<T,1,2>, 1<T,2,1>, 1<T,3,0>, 1<F,*,0>
Controlled/Deliberate(Reflective) Approach 0 0 0 accept accept 1 1 1 2 reject 2 2 reject 3 3 3 • Decision-theoretic problem-space search across metalevels • Very Soar-like, but with softmax combination • Depends on summary product and Sigma’s mixed aspect • Corresponds to PsychSim’s online reasoning none none tie tie tie tie no-change tie no-change no-change E(2) 2 2 accept E(accept) E(2) A A A A 1 B B
Comments on the Ultimatum Game • Automatic version (5 conditionals) • A’s normalized distribution over offers: <.315, .399, .229, .057> • 1 decision (94 messages) and .02 s (on a MacBook Air) • Controlled version (19 conditionals) • A’s normalized distribution over offers: <.314, .400, .229, .057> • 72 decisions (868 messages/decision) and 126.69 s • Same result, with distinct computational properties • Automatic is fast and occurs in parallel with other memory processing, but is not (easily) penetrable by new bits of other knowledge • Controlled is slow, sequential, but can (easily) integrate new knowledge • Distinction also maps onto expert versus novice behavior in general Raises possibility of a generalization of Soar’s chunking mechanism • Compile/learn automatic trellises from controlled problem solving • Finer grained, mixed(/hybrid) learning mechanism Distributions Comparable Speed Ratio >6000
Conclusion • Simultaneous games are solvable within a single decision • Yield Nash equilibria (although linear combination doesn’t guarantee) • Sequential games are solvable in either an automatic or a controlled manner • Raises possibility of a mixed variant of chunking that automatically learns probabilistic trellises (HMMs, DBNs, …) from problem solving • May yield a novel form of general structure learning for graphical models • Two architectural modifications to Sigma were required • Multiagent decision making (and reflection) • Optional exponentiation of outgoing WM messages (for softmax) • Future work includes • More complex games • Belief updating (learning models of others)
Mental imagery[BICA 11a; AGI 12a] 1-3D continuous imagery buffer Object transformation Feature& relationship detection Perception [BICA 11b] Object recognition (CRFs) Localization Natural language Question answering (selection) Word sense disambiguation [ICCM 13] Part of speech tagging [ICCM 13] Isolated word speech recognition Graph integration[BICA 11b] CRF + Localization + POMDP Overall Progress in Sigma • Memory [ICCM 10] • Procedural (rule) • Declarative (semantic/episodic) • Constraint • Problem solving • Preference based decisions[AGI 11] • Impasse-driven reflection[AGI 13] • Decision-theoretic (POMDP)[BICA 11b] • Theory of Mind[AGI 13] • Learning[ICCM 13] • Episodic • Concept (supervised/unsupervised) • Reinforcement[AGI 12b] • Action modeling[AGI 12b] • Map (as part of SLAM) Some of these are still just beginnings