1 / 126

Model-Based Deep Reinforcement Learning with Imagination-Augmented Agents

This seminar overview at ETH Zürich discusses Imagination-Augmented Agents, showcasing their robustness to model inaccuracy and ability to achieve high performance with minimal data. The concept of Successor Features for Transfer is explored, emphasizing a value function representation that separates dynamics from rewards and integrates transfer learning into the RL framework. The I2A architecture presented by DeepMind in Feb 2018 is detailed, highlighting components like the Imagination Core and Rollout Encoder. The framework's model-free architecture features a multilayer CNN without fully connected layers, with training processes summarized for both policy and value functions.

luannd
Download Presentation

Model-Based Deep Reinforcement Learning with Imagination-Augmented Agents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Based Deep Reinforcement Learning Imagination-Augmented Agents – Successor Features for Transfer May 2019 Ioannis Miltiadis Mandralis Seminar on Deep Reinforcement Learning ETH Zürich

  2. Overview I2A • Imagination Augmented Agents • Robust to model inaccuracy • Achieves performance with less data

  3. Overview • Successor Features for Transfer • Value function representation that decouples dynamics from rewards • Integrates transfer into RL framework Successor features

  4. Imagination-Augmented Agents for Deep Reinforcement Learning Théophane Weber et. al. DeepMind Feb 2018

  5. I2A Architecture

  6. 1. Imagination Core

  7. 1. Imagination Core

  8. 1. Imagination Core

  9. 1. Imagination Core

  10. 1. Imagination Core

  11. 1. Imagination Core • Input observation…

  12. 1. Imagination Core • Input observation… • Actually a frame

  13. 1. Imagination Core • Input observation… • Actually a frame • Predict action using policy (more later)

  14. 1. Imagination Core • Input observation… • Actually a frame • Predict action using policy (more later) • Obtain imagined frame and reward

  15. 1. Imagination Core • Input observation… • Actually a frame • Predict action using policy (more later) • Obtain imagined frame and reward • How ?

  16. 1. Imagination Core Environment Model

  17. 1. Imagination Core Environment Model

  18. 1. Imagination Core Environment Model

  19. 2. Rollout Encoder

  20. 2. Rollout Encoder

  21. 2. Rollout Encoder • Blackbox • Input • Output – rollout encoding

  22. 2. Rollout Encoder • Blackbox • Input • Output – rollout encoding

  23. 2. Rollout Encoder • Step 1 • Imagine future

  24. 2. Rollout Encoder • Step 1 • Imagine future

  25. 2. Rollout Encoder • Step 1 • Imagine future

  26. 2. Rollout Encoder • Step 2 • Encode

  27. 2. Rollout Encoder • Step 2 • Encode • Pass through CNN CNN

  28. 2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM

  29. 2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM CNN

  30. 2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM

  31. 2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM

  32. 2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM • Obtain Rollout Encoding

  33. 2. Rollout Encoder

  34. 2. Rollout Encoder

  35. 3. Aggregate rollout encodings

  36. 3. Aggregate rollout encodings

  37. 3. Aggregate rollout encodings One rollout for each action in set

  38. 3. Aggregate rollout encodings

  39. 3. Aggregate rollout encodings

  40. 3. Aggregate rollout encodings

  41. 3. Aggregate rollout encodings

  42. 3. Aggregate rollout encodings

  43. 3. Aggregate rollout encodings CNN • Model-Free Architecture • Multilayer CNN without FC layers

  44. 3. Aggregate rollout encodings CNN

  45. 3. Aggregate rollout encodings CNN

  46. 3. Aggregate rollout encodings Concatenate and pass through FC Layer FC CNN

  47. 3. Aggregate rollout encodings • NN outputs • Logits of • Value CNN

  48. Training Overall gradient can be summarized

  49. Training A3C policy gradient Advantage estimate

  50. Training Value function updated toward bootstrapped -step return

More Related