1 / 19

Reinforcement Learning for Soaring CDMRG – 24 May 2010

Reinforcement Learning for Soaring CDMRG – 24 May 2010. Nick Lawrance. Reinforcement Learning for Soaring. What I want to do Have a good understanding of the dynamics involved in aerodynamic soaring in known conditions but:

nguyet
Download Presentation

Reinforcement Learning for Soaring CDMRG – 24 May 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement Learning for SoaringCDMRG – 24 May 2010 Nick Lawrance

  2. Reinforcement Learning for Soaring • What I want to do • Have a good understanding of the dynamics involved in aerodynamic soaring in known conditions but: • Dynamic soaring requires energy loss actions for net energy gain cycles which can be difficult using traditional control or path generation methods • Wind is difficult to predict; guidance and nav must be done on-line whilst simultaneously maintaining reasonable energy levels and safety requirements • Classic exploration-exploitation problem with the added catch that exploration requires energy gained through exploitation

  3. Reinforcement Learning for Soaring • Why reinforcement learning • Previous work focused on understanding soaring and examining alternatives for generating energy-gain paths. • Always have the issue of balancing exploration and exploitation, my code ended up being long sequences of heuristic rules • Reinforcement learning could provide the link from known good paths towards optimal paths

  4. Monte Carlo, TD, Sarsa & Q-learning • Monte Carlo – Learn an average reward for actions taken during series of episodes • Temporal Difference – Simultaneously estimate expected reward and value function • Sarsa – using TD for on-policy control • Q-learning – off-policy TD control

  5. Figure 6.13: The cliff-walking task. Off-policy Q-learning learns the optimal policy, along the edge of the cliff, but then keeps falling off because of the -greedy action selection. On-policy Sarsa learns a safer policy taking into account the action selection method. These data are from a single run, but smoothed.

  6. Eligibility Traces • TD(0) is effectively one-step backup of Vπ(reward only counts to previous action) • Eligibility traces extend this to reward the sequence of actions that lead to the current reward.

  7. Sarsa(λ) • Initialize Q(s,a) arbitrarily and e(s,a) = 0, for all s, a • Repeat (for each episode): • Initialize s, a • Repeat (for each step of episode): • Take action a, observe r, s’ • Choose a’ from s’ using policy derived from Q (ε-greedy) • For all s,a: • until s is terminal

  8. Sarsa(λ)

  9. Simplest soaring attempt • Square grid, simple motion, energy sinks and sources • Movement cost, turn cost, edge cost

  10. Simulation - Static

  11. Hex grid, dynamic soaring • Energy based simulation • Drag movement cost, turn cost • Constant speed • No wind motion (due to limited states)

  12. Hex grid, dynamic soaring

  13. Next • Reinforcement learning has advantages to offer our group, but our contribution should probably be focused in well defined areas • For most of our problems, the state spaces are very large and usually continuous; we need estimation methods • We usually have a good understanding of at least some aspects of the problem; how can/should we use this information to give better solutions?

More Related