210 likes | 220 Views
Dive into reinforcement learning techniques such as Passive Learning, Active Learning, and Q-learning for AI game strategies. Discover Naive Updating, Dynamic Programming, and Temporal Difference Learning methods for utility estimation. Explore challenges and strategies to enhance agent learning in game environments.
E N D
Class Project • Due at end of finals week • Essentially anything you want, so long as it’s AI related and I approve • Any programming language you want • In pairs or individual • Email me by Wednesday, November 3
Projects • Implementing Knn to Classify Bedform Stability Fields • Blackjack Using Genetic Algorithms • Computer game players:Go, Checkers, Connect Four, Chess, Poker • Computer puzzle solvers: Minesweeper, mazes • Pac-Man with intelligent monsters • Genetic algorithms: • blackjack strategy • Automated 20-questions player • Paper on planning • Neural network spam filter • Learning neural networks via GAs
Projects • Solving neural networks via backprop • Code decryptor using Gas • Box pushing agent (competing against an opponent)
What didn’t work as well • Too complicated games: Risk, Yahtzee, Chess, Scrabble, Battle Simulation • Got too focused in making game work • I sometimes had trouble running the game • Game was often incomplete • Didn’t have time to do enough AI • Problems that were too vague • Simulated ant colonies / genetic algorithms • Bugs swarming for heat (emergent intelligence never happened) • Finding paths through snow • AdaBoost on protein folding data • Couldn’t get boosting working right, needed more time on small datasets (spent lots of time parsing protein data)
Reinforcement Learning • Game playing: So far, we have told the agent the value of a given board position. • How can agent learn which positions are important? • Play whole bunch of games, and receive reward at end (+ or -) • How to determine utility of states that aren’t ending states?
The setup: Possible game states • Terminal states have reward • Mission: Estimate utility of all possible game states
What is a state? • For chess: state is a combination of position on board and location of opponents • Half of your transitions are controlled by you (your moves) • Other half of your transitions are probabilistic (depend on opponent) • For now, we assume all moves are probabilistic (probabilities unknown)
Passive Learning • Agent learns by “watching” • Fixed probability of moving from one state to another
Technique #1: Naive Updating • Also known as Least Mean Squares (LMS) approach • Starting at home, obtain sequence of states to terminal state • Utility of terminal state = reward • loop back over all other states • utility for state i = running average of all rewards seen for state i
Naive Updating Analysis • Works, but converges slowly • Must play lots of games • Ignores that utility of a state should depend on successor
Technique #2: Adaptive Dynamic Programming • Utility of a state depends entirely on the successor state • If a state has one successor, utility should be the same • If a state has multiple successors, utility should be expected value of successors
Finding the utilities • To find all utilities, just solve equations • Set of linear equations, solveable • Changes each iteration as you learn probabilities • Completely intractable for large problems: • For a real game, it means finding actual utilities of all states
Technique 3: Temporal Difference Learning • Want utility to depend on successors, but want to solve iteratively • Whenever you observe a transition from i to j: • a = learning rate • difference between successive states = temporal difference • Converges faster than Naive updating
Active Learning • Probability of going from one state to another now depends on action • ADP equations are now:
Active Learning • Active Learning with Temporal Difference Learning: works the same way (assuming you know where you’re going) • Also need to learn probabilities to eventually make decision on where to go
Exploration: where should agent go to learn utilities? • Suppose you’re trying to learn optimal game playing strategies • Do you follow best utility, in order to win? • Do you move around at random, hoping to learn more (and losing lots in the process)? • Following best utility all the time can get you stuck at an imperfect solution • Following random moves can lose a lot
Where should agent go to learn utilities? • f(u,n) = exploration function • depends on utility of move (u), and number of times that agent has tried it (n) • One possibility: instead of using utility to decide where to go, use • Try a move a bunch of times, then eventually settle
Q-learning • Alternative approach for temporal difference learning • No need to learn probabilities: considered more desirable sometimes • Instead, looking for “quality” of (state, action) pair
Generalization in Reinforcement Learning • Maintaining utilities for all seen states in a real game is intractable. • Instead, treat it as a supervised learning problem • Training set consists of (state, utility) pairs • Or, alternatively, (state, action, q-value) triples • Learn to predict utility from state • This is a regression problem, not a classification problem • Radial basis function neural networks (hidden nodes are Gaussians instead of sigmoids) • Support vector machines for regression • Etc…
Other applications • Applies to any situation where something is to learn from reinforcement • Possible examples: • Toy robot dogs • Petz • That darn paperclip • “The only winning move is not to play”