Reinforcement Learning

Reinforcement Learning Developing a self-learning snake game using Reinforcement Learning and pygame.

About me • Student, Pursuing my Bachelor’s in Software Engineering • Freelance Software Developer • A FOSS enthusiast, currently contributing to coala • Pythonista, loves to develop automation projects, Machine Learning projects and occasionally write blogs regarding python. Github: https://github.com/satwikkansal Linkedin: https://linkedin.com/in/satwikkansal Website: http://www.satwikkansal.xyz Blog: https://satwikkansal.wordpress.com

Do you remember these?

Contents • Quick Intro to Game Development : Common concepts • Designing the gameplay • Events and control, Implementing game logic • Some RL concepts: Agent, State, Reward, Policy, MDP and few more. • Q-Learning to the Rescue • Other Reinforcement Learning Techniques • Self-Driving Car in action • Current applications and Future Scopes in RL • Available open source framework and libraries The code for the workshop is available at https://github.com/satwikkansal/snakepy

Some Game Development concepts • Coordinates : The screen is a 2D grid plane with (0,0) in the top left • Colors: RGB and alpha values • Drawing: Plotting pixels, Surface Object, blitting • Rendering: Animation, Frame/Refresh rate • The game loop:

Designing the Gameplay Objects : A snake, Apples, Walls Snake eats the apples, grows 1 unit longer. Snake dies when it hits the wall or runs over itself. Objective: Eat as many apples as possible without dying. • What happens when the snake gets killed? • How to start the game?

Code Implementation: Drawing, Displaying and Moving the game objects.

User Interaction & Game Logic • Arrow keys to move the head. • Do we want our snake to keep moving. • Detecting overlaps and collisions of snake head with other objects : boundaries, apples and its body. • Scoring

Code Implementation: Adding the controls and the score to make a fully functional snake game.

Okay, let’s make our dumb computer control the snake.

Code Implementation: Wait, let’s add some intelligence to our agent. (Provide vision to the CPU i.e. game rules) Next Section: Or better, let’s make the CPU discover knowledge. (Make our snake learn from experiences)

Time to introduce Reinforcement Learning!

A few things to know • State, History and Episode • Action • Reward • Policy, value function, and model • Environment • Agent • Markov states and MDP Long story short : Everything that surrounds the agent in environment. A state represents the situation of the agent at a particular time in the environment. The agent performs an action to transition from one state to another and may receive a reward in return. The policy is the strategy of choosing an action given a state and the agent tries to chose a policy that optimizes the expected cumulative reward.

Implementation: Refactoring the game’s code

Q-learning to the rescue! • Popular, Simple, Model free RL technique (Environment’s model is not required) • Can find optimal action-selection policy for any finite MDP. • Learns the action-value function

Code Implementation: Using Q-learning to choose actions for the agent.

Our agent in action Note: Currently our rules don’t penalize snake for running over itself.

Possible Improvements to our agent • Optimizing the state space • Adding time-based rewards • Minimizing the exploration v/s exploitation tradeoff • Optimizing the hyperparameters using techniques like Grid Search, Genetic Algorithms. • Using state of the art RL techniques.

Other interesting techniques SARSA: Uses Q-Learning as a part of policy iteration mechanism, next action is chosen randomly with predefined probability, faster than Q-learning when no. of actions are high. Deep Q-Networks: Combines usage of RL and Deep Neural Networks like CNN. Learns the non-linear value-action function through experience replay.

The self-driving car simulation design State: • Car on left, right, ahead? • Traffic light green or red? • Next waypoint (from GPS) Actions: • Steer Left, Steer Right • Accelerate, brake Rewards: • Violating the traffic laws • Hitting the obstacles • Reaching the destination • Time taken to reach destination (any thoughts on this?) Code Sample available at: https://github.com/satwikkansal/smartcab

Applications of Reinforcement Learning • Playing games like chess (reward is not instantaneous, delayed feedback) • Managing portfolio and finances (reward here is the money) • Robotics (humanoid robots) • Manufacturing and inventory management. • General AI agents: Agents that can perform multiple things with single algorithm. Example, an agent playing all the Atari games.

Open source frameworks and libraries for RL Open AI gym - A toolkit for developing and comparing reinforcement learning algorithms. Open AI universe - A software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications Deepmind Lab - A customisable 3D platform for agent-based AI research

Some nice links Youtube lectures and tutorials: • UCL course on RL by D.Silver - http://bit.ly/RL-UCL • Sentdex pygame tutorial - http://bit.ly/sentdex-pygame Python Code Samples: • Reinforcement Learning, an introduction - http://bit.ly/RL-intro-Python Online Demo: • ConvNetJS - http://bit.ly/convnetjs

Reinforcement Learning