1 / 35

Advice Taking and Transfer Learning : Naturally-Inspired Extensions to Reinforcement Learning

Advice Taking and Transfer Learning : Naturally-Inspired Extensions to Reinforcement Learning. Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik University of Wisconsin - Madison University of Minnesota - Duluth*. Reinforcement Learning. reward. action. state. Environment.

haamid
Download Presentation

Advice Taking and Transfer Learning : Naturally-Inspired Extensions to Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advice Taking and Transfer Learning:Naturally-Inspired Extensionsto Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik University of Wisconsin - Madison University of Minnesota - Duluth*

  2. Reinforcement Learning reward action state Environment May be delayed Agent

  3. Q-Learning policy(state) = argmaxaction state • Update Q-function incrementally • Follow current Q-function to choose actions • Converges to accurate Q-function Q-function value action

  4. Limitations • Agents begin without any information • Random exploration required in early stages of learning • Long training times can result

  5. Naturally-Inspired Extensions • Advice Taking • Transfer Learning Human Teacher RL Agent Knowledge Target-task Agent Source-task Agent Knowledge

  6. Potential Benefits higher slope higher asymptote higher start performance withknowledge without knowledge training

  7. Outline • RL in a complex domain • Extension #1: Advice Taking • Extension #2: Transfer Learning • Skill Transfer • Macro Transfer • MLN Transfer

  8. The RoboCup Domain MoveDownfield KeepAway BreakAway +1 upon goal +1 per time step +1 per meter

  9. The RoboCup Domain distBetween(a0, Player) distBetween(a0, GoalPart) distBetween(Attacker, goalCenter) distBetween(Attacker, ClosestDefender) distBetween(Attacker, goalie) angleDefinedBy(topRight, goalCenter, a0) angleDefinedBy(GoalPart, a0, goalie) angleDefinedBy(Attacker, a0, ClosestDefender) angleDefinedBy(Attacker, a0, goalie) timeLeft state move(ahead) shoot(GoalPart) pass(Teammate) move(away) move(right) move(left) actions

  10. Q-Learning policy(state) = argmaxaction state Q-function value action Function approximation

  11. Approximating the Q-function distBetween(a0, a1) distBetween(a0, a2) distBetween(a0, goalie) … 0.2 -0.1 0.9 … Linear support-vector regression: Q-value = Feature vector Weight vector ● T ● Set weights to minimize: ModelSize + C × DataMisfit

  12. RL in 3-on-2 BreakAway

  13. Outline • RL in a complex domain • Extension #1: Advice Taking • Extension #2: Transfer Learning • Skill Transfer • Macro Transfer • MLN Transfer

  14. Extension #1: Advice Taking IF an opponent is near AND a teammate is open THEN pass is the best action

  15. Advice in RL • Advice sets constraints on Q-values under specified conditions IF an opponent is near me AND a teammate is open THEN pass has a high Q-value • Apply as soft constraints in optimization ModelSize + C × DataMisfit + μ× AdviceMisfit

  16. Advice Performance

  17. Outline • RL in a complex domain • Extension #1: Advice Taking • Extension #2: Transfer Learning • Skill Transfer • Macro Transfer • MLN Transfer

  18. Extension #2: Transfer 3-on-2 KeepAway 3-on-2 BreakAway 3-on-2 MoveDownfield

  19. Relational Transfer • First-order logic describes relationships between objects distBetween(a0, Teammate) > 10 distBetween(Teammate, goalCenter) < 15 • We want to transfer relational knowledge • Human-level reasoning • General representation

  20. Outline • RL in a complex domain • Extension #1: Advice Taking • Extension #2: Transfer Learning • Skill Transfer • Macro Transfer • MLN Transfer

  21. Skill Transfer Example 1: distBetween(a0, a1) = 15 distBetween(a0, a2) = 5 distBetween(a0, goalie) = 20 ... action = pass(a1) outcome = caught(a1) good_action(pass(Teammate)):- distBetween(a0, Teammate) > 10, distBetween(Teammate, goalCenter) <15. • Learn advice about good actions from the source task • Select positive and negative examples of good actions and apply inductive logic programming to learn rules

  22. User Advice in Skill Transfer • There may be new skills in the target that cannot be learned from the source • E.g., shooting in BreakAway • We allow users to add their own advice about these new skills • User advice simply adds to transfer advice

  23. Skill Transfer to 3-on-2 BreakAway

  24. Outline • RL in a complex domain • Extension #1: Advice Taking • Extension #2: Transfer Learning • Skill Transfer • Macro Transfer • MLN Transfer

  25. Macro Transfer move(ahead) pass(Teammate) shoot(GoalPart) • Learn a strategy from the source task • Find an action sequence that separates good games from bad games • Learn first-order rules to control transitions along the sequence

  26. Transfer via Demonstration Games played in target task 0 100 … Execute macro strategy Perform standard RL Agent learns an initial Q-function Agent adapts to the target task

  27. Macro Transfer to 3-on-2 BreakAway

  28. Outline • RL in a complex domain • Extension #1: Advice Taking • Extension #2: Transfer Learning • Skill Transfer • Macro Transfer • MLN Transfer

  29. MLN Transfer • Learn a Markov Logic Network to represent the source-task policy relationally • Apply the policy via demonstration in the target task state MLN Q-function value action

  30. Markov Logic Networks Y X Z B A • A Markov network models a joint distribution • A Markov Logic Network combines probability with logic • Template: a set of first-order formulas with weights • Each grounded predicate in a formula becomes a node • Predicates in grounded formula are connected by arcs • Probability of a world: (1/Z) exp( Σ WiNi )

  31. MLN Q-function IF distance(me, Teammate) < 15 AND angle(me, goalie, Teammate) > 45 THEN Q є (0.8, 1.0) Formula 1 W1 = 0.75 N1 = 1 teammate IF distance(me, GoalPart) < 10 AND angle(me, goalie, GoalPart) > 45 THEN Q є (0.8, 1.0) Formula 2 W2 = 1.33 N2 = 3 goal parts Probability that Q є (0.8, 1.0): __exp(W1N1 + W2N2)__ 1 + exp(W1N1 + W2N2)

  32. Using an MLN Q-function Q є (0.8, 1.0) P1 = 0.75 Q = P1 ● E [Q | bin1] + P2 ● E [Q | bin2] + P3 ● E [Q | bin3] Q є (0.5, 0.8) P2 = 0.15 Q є (0, 0.5) P2 = 0.10 Q-value of most similar training example in bin

  33. MLN Transfer to 3-on-2 BreakAway

  34. Conclusions • Advice and transfer can provide RL agents with knowledge that improves early performance • Relational knowledge is desirable because it is general and involves human-level reasoning • More detailed knowledge produces larger initial benefits, but is less widely transferrable

  35. Acknowledgements • DARPA grant HR0011-04-1-0007 • DARPA grant HR0011-07-C-0060 • DARPA grant FA8650-06-C-7606 • NRL grant N00173-06-1-G002 Thank You

More Related