160 likes | 171 Views
This project explores implementing AI & Temporal Difference Learning in a chess program. It covers heuristic searches, evaluation functions, Minimax search, Alpha-beta pruning, and learning processes. The study delves into the complexity of chess evaluation and the potential for improvement through learning algorithms. Development stages progress from a text-based game to introducing a computer player and integrating learning techniques like Temporal Difference Learning. Testing involves comparing learning vs. non-learning players in game simulations to assess their win-loss differentials. The references include works on temporal difference learning, neural networks, and AI in gaming.
E N D
By James Mannion Computer Systems Lab 08-09 Period 3 The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme
Abstract • Searching through large sets of data • Complex, vast domains • Heuristic searches • Chess • Evaluation Function • Machine Learning
Introduction • Simple domains, simple heuristics • The domain of chess • Deep Blue – brute force • Looking at 30^6 moves before making the first • Supercomputer • Too many calculations • Not efficient
Introduction (cont’d) • Minimax search • Alpha-beta pruning • Only look 2-3 moves into the future • Estimate strength of position • Evaluation function • Can improve heuristic by learning
Introduction (cont’d) • Seems simple, but can become quite complex. • Chess masters spend careers learning how to “evaluate” moves • Purpose: can a computer learn a good evaluation function?
Background • Claude Shannon, 1950 • Brute force would take too long • Discusses evaluation function • 2-ply algorithm, but looks further into the future for moves that could lead to checkmate • Possibility of learning in distant future
Development • Python • Stage 1: Text based chess game • Two humans input their moves • Illegal moves not allowed
Development (cont’d) • Stage 2: Introduce a computer player • 2-3 ply • Evaluation function will start out such that choices are based on a simple piece-differential where each piece is waited equally
Development (cont’d) • Stage 3: Learning • Temporal Difference Learning • Weight adjustment: • w_i < − − w_i + a((n_ic − n_ip)/(n_ic)) • Heuristic function: • h = c_1(p_1) + c_2(p_2) + c_3(p_3) + c_4(p_4) + c_5(p_5) • Piece values: • p-i = Sum(w_i) – Sum(b_i) over i
Testing • Learning vs No Learning • Two equal, piece-differential players pitted against each other. • One will have the ability to learn • Thousands of games • Win-loss differential tracked over the length of the test • By the end, the learner should be winning significantly more games.
References • Shannon, Claude. “Programming a Computer for Playing Chess.” 1950 • Beal, D.F., Smith, M.C. “Temporal Difference Learning for Heuristic Search and Game Playing.” 1999 • Moriarty, David E., Miikkulainen, Risto. “Discovering Complex Othello Strategies Through Evolutionary Neural Networks.” • Huang, Shiu-li, Lin, Fu-ren. “Using Temporal-Difference Learning for Multi-Agent Bargaining.” 2007 • Russell, Stuart, Norvig, Peter. Artificial Intelligence: A Modern Approach. Second Edition. 2003. • Asgharbeygi, Nima, Stracuzzi, David and Langley, Pat.“Relational Temporal Difference Learning”.