1 / 22

Relational Macros for Transfer in Reinforcement Learning

Relational Macros for Transfer in Reinforcement Learning. Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA. Transfer Learning Scenario. Agent encounters related Task B.

mhennigan
Download Presentation

Relational Macros for Transfer in Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

  2. Transfer Learning Scenario Agent encounters related Task B Agent recalls relevant knowledge from Task A Agent uses this knowledge to learn Task B quickly Agent learns Task A

  3. Goals of Transfer Learning Learning curves in the target task: performance with transfer without transfer training

  4. Reinforcement Learning Described by a set of features Observe world state Take an action Policy: choose the action with the highest Q-value in the current state Receive a reward Use the rewards to estimate the Q-values of actions in states

  5. The RoboCup Domain 4-on-3 BreakAway 2-on-1 BreakAway 3-on-2 BreakAway

  6. Transfer in Reinforcement Learning Copy the Q-function • Related work • Model reuse (Taylor & Stone 2005) • Policy reuse (Fernandez & Veloso 2006) • Option transfer (Perkins & Precup 1999) • Relational RL (Driessens et al. 2006) • Our previous work • Policy transfer (Torrey et al. 2005) • Skill transfer (Torrey et al. 2006) Learn rules that describe when to take individual actions Now we learn a strategy instead of individual skills

  7. Representing a Multi-step Strategy pass(Teammate) ← isOpen(Teammate) hold ← true isClose(Opponent) allOpponentsFar • A relational macro is a finite-state machine • Nodes represent internal states of agent in which limited independent policies apply • Conditions for transitions and actions are in first-order logic Really these are rule sets, not just single rules The learning agent jumps between players

  8. Our Proposed Method • Learn a relational macrothat describes a successful strategy in the source task • Execute the macro in the target task to demonstrate the successful strategy • Continue learning the target task with standard RL after the demonstration

  9. Learning a Relational Macro • We use ILP to learn macros • Aleph: top-down search in a bottom clause • Heuristic and randomized search • Maximize F1 score • We learn a macro in two phases • The action sequence (node structure) • The rule sets for actions and transitions

  10. Learning Macro Structure move(ahead) pass(Teammate) shoot(GoalPart) • Objective: find an action pattern that separates good and bad games macroSequence(Game) ← actionTaken(Game, StateA, move, ahead, StateB), actionTaken(Game, StateB, pass, _, StateC), actionTaken(Game, StateC, shoot, _, gameEnd).

  11. Learning Macro Conditions For the transition from move to pass transition(State) ← feature(State, distance(Teammate, goal)) < 15. For the policy in the pass node action(State, pass(Teammate)) ← feature(State, angle(Teammate, me, Opponent)) > 30. move(ahead) pass(Teammate) shoot(GoalPart) • Objective: describe when transitions and actions should be taken

  12. Examples for Actions positive negative move(ahead) pass(Teammate) shoot(GoalPart) Game 1: move(ahead) pass(a1) shoot(goalRight) scoring Game 2: move(ahead) pass(a2) shoot(goalLeft) Game 3: move(right) pass(a1) non-scoring Game 4: move(ahead) pass(a1) shoot(goalRight)

  13. Examples for Transitions positive negative move(ahead) pass(Teammate) shoot(GoalPart) Game 1: move(ahead) pass(a1) shoot(goalRight) scoring Game 2: move(ahead) move(ahead) shoot(goalLeft) non-scoring Game 3: move(ahead) pass(a1) shoot(goalRight)

  14. Transferring a Macro • Demonstration • Execute the macro strategy to get Q-value estimates • Infer low Q-values for actions not taken by macro • Compute an initial Q-function with these examples • Continue learning with standard RL • Advantage: potential for large immediate jump in performance • Disadvantage: risk that agent will blindly follow an inappropriate strategy

  15. Experiments • Source task: 2-on-1 BreakAway • 3000 existing games from the learning curve • Learn macros from 5 separate runs • Target tasks: 3-on-2 and 4-on-3 BreakAway • Demonstration period of 100 games • Continue training up to 3000 games • Perform 5 target runs for each source run

  16. 2-on-1 BreakAway Macro pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) The learning agent jumped players here In one source run this node was absent This shot is apparently a leading pass The ordering of these nodes varied

  17. Results: 2-on-1 to 3-on-2

  18. Results: 2-on-1 to 4-on-3

  19. Conclusions • This transfer method can significantly improve initial target-task performance • It can handle new elements being added to the target task, but not new objectives • It is an aggressive approach that is a good choice for tasks with similar strategies

  20. Future Work • Alternative ways to apply relational macros in the target task • Keep the initial benefits • Alleviate risks when tasks differ more • Alternative ways to make decisions about steps within macros • Statistical relational learning techniques

  21. Acknowledgements • DARPA Grant HR0011-04-1-0007 • DARPA IPTO contract FA8650-06-C-7606 Thank You

  22. Rule scores • Each transition and action has a set of rules, one or more of which may fire • If multiple rules fire, we obey the one with the highest score • The score of a rule is the probability that following it leads to a successful game Score = # source-task games that followed the rule and scored # source-task games that followed the rule

More Related