340 likes | 368 Views
Neural Networks Chapter 7. Joost N. Kok Universiteit Leiden. Recurrent Networks. Learning Time Sequences: Sequence Recognition Sequence Reproduction Temporal Association. Recurrent Networks. Tapped Delay Lines: Keep several old values in a buffer. Recurrent Networks. Drawbacks:
E N D
Neural NetworksChapter 7 Joost N. Kok Universiteit Leiden
Recurrent Networks • Learning Time Sequences: • Sequence Recognition • Sequence Reproduction • Temporal Association
Recurrent Networks • Tapped Delay Lines: • Keep several old values in a buffer
Recurrent Networks • Drawbacks: • Length must be chosen in advance, leads to large number of input units, large number of training patterns, etc. • Replace fixed time delays by filters:
Output Nodes Hidden Nodes Input Nodes Context Nodes Recurrent Networks • Partially recurrent networks
Recurrent Networks • Jordan Network
Elman Network Output Nodes Hidden Nodes Input Nodes Context Nodes Recurrent Networks
Output Units Context Layer Hidden Layer Context Layer Hidden Layer Context Layer Input Layer Recurrent Networks • Expanded Hierarchical Elman Network
Recurrent Networks • Back-Propagation Through Time
Reinforcement Learning • Supervised learning with some feedback • Reinforcement Learning Problems: • Class I: reinforcement signal is always the same for given input-output pair • Class II: stochastic environment, fixed probability for each input-output pair • Class III: reinforcement and input patterns depend on past history of network output
Associative Reward-Penalty • Stochastic Output Units • Reinforcement Signal • Target • Error
Associative Reward Penalty • Learning Rule
Models and Critics Environment
Reinforcement Comparison Critic Environment
Reinforcement Learning • Reinforcement-Learning Model • Agent receives input I which is some indication of current state s of environment • Then the agent chooses an action a • The action changes the state of the environment and the value is communicated through a scalar reinforcement signal r
Reinforcement Learning • Environment: You are in state 65. You have four possible actions. • Agent: I’ll take action 2. • Environment: You received a reinforcement of 7 units. You are now in state 15. You have two possible actions. • Agent: I’ll take action 1. • Environment: You received a reinforcement of -4 units. You are now in state 12. You have two possible actions. • Agent: I’ll take action 2. • …
Reinforcement Learning • Environment is non-deterministic: • same action in same state may result in different states and different reinforcements • The environment is stationary: • Probabilities of making state transitions or receiving specific reinforcement signals do not change over time
Reinforcement Learning • Two types of learning: • Model-free learning • Model based learning • Typical application areas: • Robots • Mazes • Games • …
Reinforcement Learning • Paper: A short introduction to Reinforcement Learning (Stephan ten Hagen and Ben Krose)
Reinforcement Learning • Environment is a Markov Decision Proces
Reinforcement Learning • Optimize interaction with environment • Optimize action selection mechanism • Temporal Credit Assignment Problem • Policy: action selection mechanism • Value function:
Reinforcement Learning • Optimal Value function based on optimal policy:
Reinforcement Learning • Policy Evaluation: approximate value function for given policy • Policy Iteration: start with arbitrary policy and improve
Reinforcement Learning • Improve Policy:
Reinforcement Learning • Value Iteration: combine policy evaluation and policy improvement steps:
Reinforcement Learning • Monte Carlo: use if and are not known • Given a policy, several complete iterations are performed • Exploration/Exploitation Dilemma • Extract Information • Optimize Interaction
Reinforcement Learning • Temporal Difference (TD) Learning • During interaction, part of the update can be calculated • Information from previous interactions is used
Reinforcement Learning • TD(l) learning: discount factor l : the longer ago the state was visited, the less it will be effected by the present update
Reinforcement Learning • Q-learning: combine actor and critic:
Reinforcement Learning • Use temporal difference learning
Reinforcement Learning • Q(l) learning:
Reinforcement Learning • Feedforward Neural Networks are used when state/action spaces are large for of estimates of V(s) and Q(s,a).