450 likes | 558 Views
Modeling transfer of learning in games of strategic interaction. Ion Juvina & Christian Lebiere Department of Psychology Carnegie Mellon University. Background | Experiment | Model | In progress | Discussion. Outline. Background Experiment Cognitive model
E N D
Modeling transfer of learning in games of strategic interaction Ion Juvina & Christian Lebiere Department of Psychology Carnegie Mellon University ACT-R Workshop July 2012
Background | Experiment | Model | In progress | Discussion Outline • Background • Experiment • Cognitive model • Work in progress • Discussion
Background| Experiment | Model | In progress | Discussion Transfer of learning • Alfred Binet (1899): • Formal discipline: • Exercise of mental faculties -> generalization • Thorndike (1903): • Identical element theory: • transfer of learning occurs only when identical elements of behavior are carried over from one task to another • Singley & Anderson (1989): • Surface vs. deep similarities • Common “cognitive units”
Background| Experiment | Model | In progress | Discussion Transfer in strategic interaction • Bipartisan cooperation in Congress • Golf -> bipartisanship? • Similarity? What is transferred?
Background| Experiment | Model | In progress | Discussion Prisoner’s Dilemma (PD)
Background| Experiment | Model | In progress | Discussion Chicken game (CG)
Background| Experiment | Model | In progress | Discussion PD & CG payoff matrices
Background| Experiment | Model | In progress | Discussion Similarities between PD & CG • Surface (near transfer) • 2X2 games • 2 symmetric and 2 asymmetric outcomes • [1,1] outcome is identical • Deep (far transfer) • Mixed motive • Non-zero sum • Mutual cooperation is superior to competition in long term • Though unstable (risky)
Background| Experiment | Model | In progress | Discussion Differences between PD & CG • Different equilibria: • Symmetric in PD: [-1,-1] • Asymmetric in CG: [-1, 10] and [10,-1] • Different strategies to maximize joint payoff (Pareto-efficient outcome): • [1,1] in PD • Alternation of [-1,10] and [10,-1] in CG
Background | Experiment | Model | In progress | Discussion Questions / hypotheses • Similarities • Identical element? Common cognitive units? • Transfer of learning • Is there any transfer? • Only in one direction? • Low – high entropy? (Bednar, Chen, Xiao Liu, & Page, in press) • Identical element -> both ways • Mechanism of transfer • Reciprocal trust mitigates the risk associated with the long term solution (Hardin, 2002)
Background | Experiment | Model | In progress | Discussion Participants and design • 480 participants (CMU students) • 240 pairs • 2 within-subjects games: PD & CG • 4 between-subjects information conditions • No-info: 60 pairs • Min-info: 60 pairs • Mid-info: 60 pairs • Max-info: 60 pairs • 2 between-subjects order conditions in each information condition • PD-CG: 30 pairs • CG-PD: 30 pairs • 200 unnumbered rounds for each game
Background | Experiment | Model | In progress | Discussion Typical outcomes
Background | Experiment | Model | In progress | Discussion Pareto-optimal equilibria
Background | Experiment | Model | In progress | Discussion [1,1] increases with info
Background | Experiment | Model | In progress | Discussion Alternation increases with info
Background | Experiment | Model | In progress | Discussion PD – CG sequence
Background | Experiment | Model | In progress | Discussion CG – PD sequence
Background | Experiment | Model | In progress | Discussion PD before and after
Background | Experiment | Model | In progress | Discussion CG before and after
Background | Experiment | Model | In progress | Discussion Transfer from PD to CG • Increased [1,1] (surface transfer) • Increased alternation (deep transfer)
Background | Experiment | Model | In progress | Discussion Transfer from CG to PD • Increased [1,1] (surface + deep transf.)
Background | Experiment | Model | In progress | Discussion PD CG [1,1] Surface [1,1] Deep [10,-1] / [-1,10] Divergent effects
Background | Experiment | Model | In progress | Discussion CG PD [1,1] Surface [1,1] Deep [10,-1] / [-1,10] Convergent effects
Background | Experiment | Model | In progress | Discussion Reciprocation by info
Background | Experiment | Model | In progress | Discussion Payoff by info in PD and CG
Background | Experiment | Model | In progress | Discussion Summary results • Mutual cooperation increases with awareness of interdependence (info) • Transfer of learning • Better performance “after” than “before” • Combined effects of surface and deep similarities • CG -> PD surface similarity facilitates transfer • PD -> CG surface similarity interferes with transfer • Transfer occurs in both directions • Mechanism of generalization • Reciprocal trust?
Background | Experiment | Model | In progress | Discussion Cognitive model • Awareness of interdependence • Opponent modeling • Generality • Utility learning (reinforcement learning) • Transfer of learning • Surface transfer • Deep transfer
Background | Experiment | Model | In progress | Discussion Opponent modeling • Instance-based learning • Dynamic representation of the opponent • Sequence learning • Prediction of opponent’s next move • Instance (snapshot of the current situation) • Previous moves and opponent’s current move • Contextualized expectations
Background | Experiment | Model | In progress | Discussion Utility learning • Reinforcement learning • Strategy: what move to make given • Expected move of opponent • Context (previous moves) • Reward functions • Own payoff – Opponent’s payoff • Opponent’s payoff • Joint payoff – Opponent’s previous payoff
Background | Experiment | Model | In progress | Discussion Surface transfer • Declarative sub-symbolic learning • Retrieval of instances guided by recency and frequency • Strategy learning • A learned strategy continues to be used for a while until it is unlearned
Background | Experiment | Model | In progress | Discussion Deep transfer • Trust learning / Trust dynamics • Trust accumulator • Increases when opponent makes cooperative (risky) moves • Decreases when opponent makes competitive moves • Trust invest accumulator • Increases with mutually destructive outcome • Decreases with unreciprocated cooperation (risk taking)
Background | Experiment | Model | In progress | Discussion Meta strategy • Determines which reward function to use • Trust accumulator <= 0 • Reward = own payoff – opponent’s payoff • Trust invest accumulator > 0 • Reward = opponent’s payoff • Trust accumulator > 0 • Reward = joint payoff – opp’s previous payoff
Background | Experiment | Model | In progress | Discussion Instance Current moves: A B Previous moves: A A Move Best response: A Predicted move: A Opponent Move A Declarative Memory Procedural Memory Trust Trust accumulator Trust invest Inst3 Rule1 Rule3 Inst2 Inst4 Rule2 Reward Inst1 Prediction Previous moves: A B Opponent move: A Model diagram Environment ACT-R ACT-R extension HSCB 2011
Background | Experiment | Model | In progress | Discussion PD-CG
Background | Experiment | Model | In progress | Discussion CG-PD
Background | Experiment | Model | In progress | Discussion PD-CG surface transfer
Background | Experiment | Model | In progress | Discussion PD-CG deep transfer
Background | Experiment | Model | In progress | Discussion CG – PD surf+deep transfer
Background | Experiment | Model | In progress | Discussion Trust simulation
Background | Experiment | Model | In progress | Discussion Summary model results • Awareness of interdependence • Opponent modeling • Generality • Utility learning • Transfer of learning • Surface level transfer: cognitive units • Deep level transfer: Trust
Background | Experiment | Model | In progress | Discussion In progress • Expand model to account for all information conditions • Develop more ecologically valid paradigm (IPD^3) • Model “affective” processes in ACT-R
Background | Experiment | Model | In progress | Discussion IPD^3
Background | Experiment | Model | In progress | Discussion General discussion • Transfer of learning is possible • Deep similarities: interpersonal level • IPD^3 • To be used in behavioral experiments • Tool for learning strategic interaction skills
Acknowledgments • Coty Gonzalez • Jolie Martin • Hau-Yu Wong • Muniba Saleem • This research is supported by the Defense Threat Reduction Agency (DTRA) grant number: HDTRA1-09-1-0053 to Cleotilde Gonzalez and Christian Lebiere
• Thank you for your attention! • Questions?