Reinforcement Learning and Genetic Algorithms

Reinforcement Learning and Genetic Algorithms Staffan Järn

Intelligent learning algortithm • Doesn’t require the presence of a teacher • The algorithm is given a reward (a reinforcement) for good actions • The algortithm tries to figure out what is the best action to take in a given state, without knowing the final optimal solution. • The actions are based on rewards and penalties. Reinforcement learning

Robot control • Elevator scheduling (search for patterns) • Telecommunications (finding networks) • Games (Chess, Backgammon) • Financial trading Areas

Gridworld (4 x 12) • The walker (agent) is supposed to find the shortest or safest way to the finish, without falling into the cliff (blue area) • Falling into to cliff gives 100 penalty points, and the walker has to start over again Cliffwalker program in Matlab

Q-learning algorithm • Matrix, called the Q-matrix • 48 x 4 matrix (12x4 gridworld) x 4 (four directions) • The Q-matrix contains a ”price” for taking a certain action • Initialized randomly in the beginning • The walker has two options: • Take the optimal action, according to smallest Q-value • Explore the gridworld by taking a random step (cannot walk into the wall) • Q-value is updated according to the equation every time the walker takes an action

The new value in the Q-matrix for the previous state and taking the previously taken action will be updated based on: what it was before multiplied by (1-α), plus a factor (alfa) multiplied by the sum of the cost to take a step (usually 1, cliff 100) and another factor (gamma) multiplied by the best action the walker can take (optimal action) Best action New value Previous step Sum of the cost Gamma = reward factor Alfa = learning factor

SARSA-algorithm • Another way of updating the Q-matrix • Not based on the next optimal move, but on the next actual move • Means that it will take into account the risk of falling into the cliff, and will eventually arrive at a safer path •  Longer, but safer path

is based on 3 parameters • learning factor, the higher the faster the walker learns • reward factor, the higher the more reward is give for good actions • exploration factor, a high value leads to more randomness • In the following example these values were used: • α= [0.1], γ=[1], ε=[0.05] The program...

Fig 1) Q-learning, the 100-th walk Fig 2) Q-learning, optimal solution Results Fig 3) SARSA, the 100-th walk Fig 4) SARSA, optimal solution

Random steps over the cliff Results

GA can be applied to the Cliffwalker problem by: • replacing the Reinforcement learning algorithm by GA’s to find the best path in the gridworld, or • finding the best learning parameters for the Reinforcement learning algorithm • The conclusion is that GA’s will probably not improve the results remarkably to Reinforcement learning algortihms. Since it will very soon find out which are the best parameters.. Genetic Algorithms

Reinforcement Learning (pdf), Jonas Waller [2005] • Cliffwalkerprogram, Jonas Waller [2005] • Reinforcement Learning, An Introduction. Sutton and Barto Sources

Reinforcement Learning and Genetic Algorithms

Reinforcement Learning and Genetic Algorithms

Presentation Transcript

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning : Learning Algorithms

Evolutionary Algorithms for Reinforcement Learning

Coevolutionary Learning with Genetic Algorithms

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING

Genetic Algorithms and Genetic Programming

Reinforcement Learning

GENETIC ALGORITHMS AND GENETIC PROGRAMMING

Algorithms For Inverse Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning