160 likes | 310 Views
Applying reinforcement learning to Tetris A reduction in state space. Underling : Donald Carr Supervisor : Philip Sterne. Reinforcement learning. Branch of AI Characterised by a lack of direct interaction between programmer and artificial agent.
E N D
Applying reinforcement learning to Tetris A reduction in state space Underling : Donald Carr Supervisor : Philip Sterne
Reinforcement learning • Branch of AI • Characterised by a lack of direct interaction between programmer and artificial agent. • Agent is given access to simulated environment and develops its own tactics through trial and error.
Reinforcement learning • Characterised by : • 4 components • Policy A mapping from state to action • Value function A description of long term reward • Reward function A numerical response to goal realisation/alienation • System model Internal representation of system
Intricacies • No initial assumptions on part of program • Many established weighting functions used to develop the value function. Encourage persistent learning, or converging to an optimal solution • Exploration vs. exploitation
Its all been half-done before • Yael Bdolah & Dror Livnat http://www.math.tau.ac.il/~mansour/rl-course/student_proj/livnat/tetris.html • S Melax www.melax.com/tetris/
Dimensionality • “the curse of dimensionality“ – Richard Bellman • Using a binary description of the blocks, each additional block doubles memory requirements • Exponential complexity
Consequence • Successfully applying reinforcement learning to hobbled Tetris
Redefine your enemy • Resting environment is tiny 2 by 8 blocks = 2^16 possible states • Blocks fall from an infinite height • There is infinite time for decision • Placement options do not decrease as time progresses • Goals remain constant over time • Linear risk vs. reward response
The human lot • Environment is massive 13*20 blocks = 2^260 possible states • The are very real time constraints with the number of options decreasing as block descends • Successfully completing 4 rows carries 16 times the reward of completing 1 row, but also carries much higher risk • Logical tactics change as finite stage fills up. e.g. Don’t risk 4 row completion with 2 empty rows remaining
No hand : Just boot or sweetie • No explicit tactics yielded to computer (digital virgin) • Given sensory perception via our description of the system • Given ability to rotate and manoeuvre Tetris piece • Receives external reward or punishment we associate with state transitions • Given long term memory
School of hard knocks Iterative training Agent goes from completely ignorant entity to veritable veteran in iterative process • Rate of learning • Depth of learning • Flexibility of learning Balance between common parameters
Refocus • Focus of project is on minimising state space • Implementing Tetris specific solutions • mirror Symmetry : sqrt of state space • Focusing on restricted section of formation e.g. top 4 rows of formation • Considering several substates • Researching and implementing general optimisations • Possibly utilising other numeric methods to find best possibility in state space (standard description involves linear iterative search for alternative with maximum value)
Strategic planning Toying with methods of representation - ongoing Code / Hijack Tetris Basic learning Increasing complexity of system Increasing complexity of agent Noting shortcomings and countering flaws Looking for generality in optimisations Look for direct application to external problems Look for similarities in external problems
Fuzzy outline 4 weeks : Research period 1 week : Code Tetris and select structures 3 weeks : Achieve basic learning with agent 5 weeks : Optimisation of state space 3 weeks : Testing
Possible outcomes • Optimisations capable of extending reinforcement learning to problems previously considered outside of its sphere of application • Unbiased flexibility of reinforcement learning applied to a problem it is ideal for • A possible contender for the Tetris world record (algorithmic) http://www.colinfahey.com/2003jan_tetris/tetris_world_records.htm