1 / 19

Event-Learning with a Non-Markovian Controller

Event-Learning with a Non-Markovian Controller. Istv án Szita, Bálint Takács & András Lőrincz. Eötvös Loránd University Hungary. Acknowledgements. thanks to ECCAI for the travel grant work partially supported by European Office for Aerospace Research and Development

Download Presentation

Event-Learning with a Non-Markovian Controller

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Event-Learning with a Non-Markovian Controller István Szita, Bálint Takács & András Lőrincz Eötvös Loránd University Hungary

  2. Acknowledgements • thanks to ECCAI for the travel grant • work partially supported by • European Office for Aerospace Research and Development • Air Force Office of scientific Research • Hungarian National Science Foundation • thanks to Csaba Szepesvári for the helpful comments Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  3. Introduction: reinforcement learning max. reward agent state action reward environment Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  4. Introduction:Markov decision processes • fully observable, Markovian • state and action space: S, A • transition probabilities: P(s,a,s’) • reward function: R(s,a,s’) • policy: p(s,a) • value function: V(s) := E(gt×rt|s0=s) Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  5. Introduction:Solving MDPs • optimal value function: • action-value function: Q(s,a), Q*(s,a) • policy from Q • solution: • iteration • iterated averaging • sampling • e.g. DP, Q-learning, SARSA • most of them provably converges Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  6. Event-learning • basic idea: learn values of events: E(s,s’) • Event: (s,s’) transition • policy: pE(s,s’) • expected advantages: • catches the subgoal concept • higher level decision • good performance • needs a controller Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  7. Event-learning:the controller • selects an action (sequence) • tries to realize the planned event (s,s’) • what should be the controller? • simplest: (approximate) inverse dynamics • may be too coarse • hierarchical: lower level RL agent • hard to tune • an intermediate solution: SDS controller • simple but robust Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  8. The SDS controller • approximate inverse dynamics: gives an action for desired event (s,sdesired) • error: (s,sexperienced) happens • correction (feedback) term: discounted integral of sdesired(t) – sexperienced(t) • the action given by the inverse dynamics is corrected by Λ·(feedback term) • (continuous action space) Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  9. Properties of the SDS controller • very mild conditions on the approx. inv. dynamics • asymptotically bounded error ( < , for sufficiently large Λ) • robust (→ experiments) • Event-learning with SDS • non-Markovian • performance guarantee? Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  10. -stationary MDPs • transition probabilities may change over time • the changes are small, not cumulative:remain in a small environment of some base MDP P  base MDP Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  11. RL in -MDPs • we can use RL algorithms also in -MDPs • they do not converge to an optimal policy(does not exist) • we showed that they are still near-optimal:for large t, ║Vt – V*║< K· V* optimal value function of base MDP K· Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  12. Back to event-learning and SDS • from the viewpoint of the event-learning agent, the controller is part of the environment! • the error of SDS is less than  • the environment is -MDP • event-learning with SDS is asymptotically near-optimal Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  13. Demonstration problem:the pendulum Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  14. Demonstration problem:the pendulum Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  15. Experiment 1:Comparison with SARSA Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  16. Experiment 2:Robustness Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  17. Summary – -MDPs • general theorem on near-optimality of RL algorithms • applicable: • event-learning • fast changing or uncertain environments Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  18. Summary – event-learning • learns tiny subgoals • Event-learning with an SDS controller is • practically: robust • theoretically: bounded deviation from optimum Szita, Takács & Lőrincz: Event-learning with non-Markovian controller

  19. Thanks for your attention!

More Related