130 likes | 349 Views
Reinforcement Learning and Tetris. Jared Christen. Tetris. Markov decision processes Large state space Long-term strategy without long-term knowledge. Background. Hand-coded algorithms can clear > 1,000,000 lines Genetic algorithm by Roger Llima averages 42,000 lines
E N D
Reinforcement Learning and Tetris Jared Christen
Tetris • Markov decision processes • Large state space • Long-term strategy without long-term knowledge
Background • Hand-coded algorithms can clear > 1,000,000 lines • Genetic algorithm by Roger Llima averages 42,000 lines • Reinforcement learning algorithm by Kurt Driessens averages 30-40 lines
Goals • Develop a Tetris agent that improves on previous reinforcement learning implementations • Secondary goals • Use as few handpicked features as possible • Encourage risk-taking • Include rarely-studied features of Tetris
Approach • TD() with a feedforward neural network
Neural Net Control • Inputs • Raw state – filled & empty blocks • Handpicked features • Outputs • Movements • Placements
Structure Active tetromino Next tetromino Hold value Placement 1 value Held tetromino Placement n value Placement 1 score Placement 1 match length Placement n score Placement n match length
Experiments • 200 learning games • Averaged over 30 runs • Two-piece and six-piece configurations • Compare to benchmark contour matching agent
Results Six-piece Two-piece
Conclusions • Accidentally developed a heuristic that beats previous reinforcement learning techniques • Six-piece’s outperformance of two-piece suggests there is some pseudo-planning going on • A better way to generalize the board state may be necessary