100 likes | 718 Views
Learning BlackJack with ANN (Aritificial Neural Network). Ip Kei Sam sam@cae.wisc.edu ID: 9012828100. Goal. Use Reinforcement Learning algorithm to learn strategies in Blackjack. Train MLP to play Blackjack without explicitly teaching the rules of the game.
E N D
Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam sam@cae.wisc.edu ID: 9012828100
Goal • Use Reinforcement Learning algorithm to learn strategies in Blackjack. • Train MLP to play Blackjack without explicitly teaching the rules of the game. • Develop a better strategy with ANN that beats the Dealer’s 17 points rule.
Blackjack • Draw cards from a deck of 52 cards to a total value as close to 21 as possible. • Simplify Blackjack to allow only hit or stand in each turn.
Reinforcement Learning • Map situations to actions such that the reward value is maximized. • Decide which actions (hit/stand) to take by finding the actions that yields the highest reward through trial and error. • Update winning probability of the intermediate states after each game. • The winning probability of each state converges as the learning parameter decreases after each game.
Result table from learning • The first 5 columns = dealer’s cards • next 5 columns = the player’s cards • Card sorted in ascending order • Column 11 = the winning probability of each state • Column 12 & 13 = action taken by the player • Action [1 0] -> “hit” • [0 1] -> “stand” and [1 1] -> end state 2.0000 5.0000 0 0 0 6.0000 6.0000 0 0 0 0.3700 1.0000 0 2.0000 5.0000 0 0 0 4.0000 6.0000 6.0000 0 0 0.2500 1.0000 0 2.0000 5.0000 10.0000 0 0 4.0000 6.0000 6.0000 7.0000 0 0 1.0000 1.0000
MLP Configurations • Normalization in feature vectors, and scaled to range of -5 to 5. • Max. Training Epochs: 1000, epoch size = 64 • Activation function (hidden layer)=hyperbolic tangent • Activation function (output layer) = sigmoidal • MLP1: α = 0.1, µ = 0, MLP Config 4-10-10-10-2. 89.5%. • MLP2:α = 0.1, µ = 0.8, MLP Config 5-10-10-10-2. 91.1%. • MLP3: α = 0.8, µ = 0, MLP Config 5-10-10-10-2. 92.5%. • MLP4: α = 0.1, µ = 0, MLP Config 6-12-12-12-2. 90.2%.
Experiment Results When dealer uses 17-point rule: When player uses random moves: When both dealer and player use MLP:
Conclusion • MLP network works best for highly random and dynamic games, where the game rules and the strategies are hard to define and the game outputs are hard to predict exactly. • Strategies interpreted from Reinforcement Learning - Hit if less than 15, otherwise stand. • As the number of game increases, the game strategies will change over time.
Future work • Current hand depends on the last hands. Use card memory in Blackjack. • Train ANN with a teacher to eliminate duplicate patterns (for example, 4 + 7 = 7 + 4 = 5 + 6 = …) and identify misclassified pattern • Train ANN to play against different experts so that it can pick up various game strategies • Include game tricks and strategies in a table for the ANN to look up • Explore other learning methods