170 likes | 289 Views
By James Mannion Computer Systems Lab 08-09 Period 3. The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme. Abstract. Searching through large sets of data Complex, vast domains Heuristic searches Chess
E N D
By James Mannion Computer Systems Lab 08-09 Period 3 The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme
Abstract • Searching through large sets of data • Complex, vast domains • Heuristic searches • Chess • Evaluation Function • Machine Learning
Introduction • Simple domains, simple heuristics • The domain of chess • Deep Blue – brute force • Looking at 30100 moves before making the first • Supercomputer • Too many calculations • Not efficient
Introduction (cont’d) • Minimax search • Alpha-beta pruning • Only look 2-3 moves into the future • Estimate strength of position • Evaluation function • Can improve heuristic by learning
Introduction (cont’d) • Seems simple, but can become quite complex. • Chess masters spend careers learning how to “evaluate” moves • Purpose: can a computer learn a good evaluation function?
Background • Claude Shannon, 1950 • Brute force would take too long • Discusses evaluation function • 2-ply algorithm, but looks further into the future for moves that could lead to checkmate • Possibility of learning in distant future
Background (cont’d) • D.F. Beal, M.C. Smith, 1999 • Temporal Difference learning • Program spends 20,000 games learning the evaluation function • Beats program that did not learn a function • Shows that programs can make their evaluation functions better
Background (cont’d) • David E. Moriarty, Riso Miikkulainen • Evolutionary Neural Networks • Othello • Complex Strategies developed
Background (cont’d) • Shiu-li Huang, Fu-ren Lin, 2007 • Temporal Difference Learning • Multi-Agent Bargaining • Bargaining, while not necessarily adversarial, is similar to chess and other games.
Development • Python • Stage 1: Text based chess game • Two humans input their moves • Illegal moves not allowed
Development (cont’d) • Stage 2: Introduce a computer player • 2-3 ply • Evaluation function will start out such that choices are random
Development (cont’d) • Stage 3: Learning • Temporal Difference Learning • Adjusts the weights of the evaluation function slightly based on gameplay • Evaluation function updated each time a game is played
Testing • Learning vs No Learning • Two equal, random computer players pitted against each other. • One will have the ability to learn • Thousands of games • Win-loss differential tracked over the length of the test • By the end, the learner should be winning significantly more games.
References • Shannon, Claude. “Programming a Computer for Playing Chess.” 1950 • Beal, D.F., Smith, M.C. “Temporal Difference Learning for Heuristic Search and Game Playing.” 1999 • Moriarty, David E., Miikkulainen, Risto. “Discovering Complex Othello Strategies Through Evolutionary Neural Networks.” • Huang, Shiu-li, Lin, Fu-ren. “Using Temporal-Difference Learning for Multi-Agent Bargaining.” 2007 • Russell, Stuart, Norvig, Peter. Artificial Intelligence: A Modern Approach. Second Edition. 2003.