1.29k likes | 1.32k Views
This seminar overview at ETH Zürich discusses Imagination-Augmented Agents, showcasing their robustness to model inaccuracy and ability to achieve high performance with minimal data. The concept of Successor Features for Transfer is explored, emphasizing a value function representation that separates dynamics from rewards and integrates transfer learning into the RL framework. The I2A architecture presented by DeepMind in Feb 2018 is detailed, highlighting components like the Imagination Core and Rollout Encoder. The framework's model-free architecture features a multilayer CNN without fully connected layers, with training processes summarized for both policy and value functions.
E N D
Model Based Deep Reinforcement Learning Imagination-Augmented Agents – Successor Features for Transfer May 2019 Ioannis Miltiadis Mandralis Seminar on Deep Reinforcement Learning ETH Zürich
Overview I2A • Imagination Augmented Agents • Robust to model inaccuracy • Achieves performance with less data
Overview • Successor Features for Transfer • Value function representation that decouples dynamics from rewards • Integrates transfer into RL framework Successor features
Imagination-Augmented Agents for Deep Reinforcement Learning Théophane Weber et. al. DeepMind Feb 2018
1. Imagination Core • Input observation…
1. Imagination Core • Input observation… • Actually a frame
1. Imagination Core • Input observation… • Actually a frame • Predict action using policy (more later)
1. Imagination Core • Input observation… • Actually a frame • Predict action using policy (more later) • Obtain imagined frame and reward
1. Imagination Core • Input observation… • Actually a frame • Predict action using policy (more later) • Obtain imagined frame and reward • How ?
1. Imagination Core Environment Model
1. Imagination Core Environment Model
1. Imagination Core Environment Model
2. Rollout Encoder • Blackbox • Input • Output – rollout encoding
2. Rollout Encoder • Blackbox • Input • Output – rollout encoding
2. Rollout Encoder • Step 1 • Imagine future
2. Rollout Encoder • Step 1 • Imagine future
2. Rollout Encoder • Step 1 • Imagine future
2. Rollout Encoder • Step 2 • Encode
2. Rollout Encoder • Step 2 • Encode • Pass through CNN CNN
2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM
2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM CNN
2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM
2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM
2. Rollout Encoder • Step 2 • Encode • Pass through CNN • Pass through LSTM • Obtain Rollout Encoding
3. Aggregate rollout encodings One rollout for each action in set
3. Aggregate rollout encodings CNN • Model-Free Architecture • Multilayer CNN without FC layers
3. Aggregate rollout encodings Concatenate and pass through FC Layer FC CNN
3. Aggregate rollout encodings • NN outputs • Logits of • Value CNN
Training Overall gradient can be summarized
Training A3C policy gradient Advantage estimate
Training Value function updated toward bootstrapped -step return