120 likes | 329 Views
Reinforcement Learning and Genetic Algorithms. Staffan Järn. Intelligent learning algortithm Doesn’t require the presence of a teacher The algorithm is given a reward (a reinforcement) for good actions
E N D
Reinforcement Learning and Genetic Algorithms Staffan Järn
Intelligent learning algortithm • Doesn’t require the presence of a teacher • The algorithm is given a reward (a reinforcement) for good actions • The algortithm tries to figure out what is the best action to take in a given state, without knowing the final optimal solution. • The actions are based on rewards and penalties. Reinforcement learning
Robot control • Elevator scheduling (search for patterns) • Telecommunications (finding networks) • Games (Chess, Backgammon) • Financial trading Areas
Gridworld (4 x 12) • The walker (agent) is supposed to find the shortest or safest way to the finish, without falling into the cliff (blue area) • Falling into to cliff gives 100 penalty points, and the walker has to start over again Cliffwalker program in Matlab
Q-learning algorithm • Matrix, called the Q-matrix • 48 x 4 matrix (12x4 gridworld) x 4 (four directions) • The Q-matrix contains a ”price” for taking a certain action • Initialized randomly in the beginning • The walker has two options: • Take the optimal action, according to smallest Q-value • Explore the gridworld by taking a random step (cannot walk into the wall) • Q-value is updated according to the equation every time the walker takes an action
The new value in the Q-matrix for the previous state and taking the previously taken action will be updated based on: what it was before multiplied by (1-α), plus a factor (alfa) multiplied by the sum of the cost to take a step (usually 1, cliff 100) and another factor (gamma) multiplied by the best action the walker can take (optimal action) Best action New value Previous step Sum of the cost Gamma = reward factor Alfa = learning factor
SARSA-algorithm • Another way of updating the Q-matrix • Not based on the next optimal move, but on the next actual move • Means that it will take into account the risk of falling into the cliff, and will eventually arrive at a safer path • Longer, but safer path
is based on 3 parameters • learning factor, the higher the faster the walker learns • reward factor, the higher the more reward is give for good actions • exploration factor, a high value leads to more randomness • In the following example these values were used: • α= [0.1], γ=[1], ε=[0.05] The program...
Fig 1) Q-learning, the 100-th walk Fig 2) Q-learning, optimal solution Results Fig 3) SARSA, the 100-th walk Fig 4) SARSA, optimal solution
Random steps over the cliff Results
GA can be applied to the Cliffwalker problem by: • replacing the Reinforcement learning algorithm by GA’s to find the best path in the gridworld, or • finding the best learning parameters for the Reinforcement learning algorithm • The conclusion is that GA’s will probably not improve the results remarkably to Reinforcement learning algortihms. Since it will very soon find out which are the best parameters.. Genetic Algorithms
Reinforcement Learning (pdf), Jonas Waller [2005] • Cliffwalkerprogram, Jonas Waller [2005] • Reinforcement Learning, An Introduction. Sutton and Barto Sources