Learning BlackJack with ANN (Aritificial Neural Network)

Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam sam@cae.wisc.edu ID: 9012828100

Goal • Use Reinforcement Learning algorithm to learn strategies in Blackjack. • Train MLP to play Blackjack without explicitly teaching the rules of the game. • Develop a better strategy with ANN that beats the Dealer’s 17 points rule.

Blackjack • Draw cards from a deck of 52 cards to a total value as close to 21 as possible. • Simplify Blackjack to allow only hit or stand in each turn.

Reinforcement Learning • Map situations to actions such that the reward value is maximized. • Decide which actions (hit/stand) to take by finding the actions that yields the highest reward through trial and error. • Update winning probability of the intermediate states after each game. • The winning probability of each state converges as the learning parameter decreases after each game.

Result table from learning • The first 5 columns = dealer’s cards • next 5 columns = the player’s cards • Card sorted in ascending order • Column 11 = the winning probability of each state • Column 12 & 13 = action taken by the player • Action [1 0] -> “hit” • [0 1] -> “stand” and [1 1] -> end state 2.0000 5.0000 0 0 0 6.0000 6.0000 0 0 0 0.3700 1.0000 0 2.0000 5.0000 0 0 0 4.0000 6.0000 6.0000 0 0 0.2500 1.0000 0 2.0000 5.0000 10.0000 0 0 4.0000 6.0000 6.0000 7.0000 0 0 1.0000 1.0000

MLP and game flow

MLP Configurations • Normalization in feature vectors, and scaled to range of -5 to 5. • Max. Training Epochs: 1000, epoch size = 64 • Activation function (hidden layer)=hyperbolic tangent • Activation function (output layer) = sigmoidal • MLP1: α = 0.1, µ = 0, MLP Config 4-10-10-10-2. 89.5%. • MLP2:α = 0.1, µ = 0.8, MLP Config 5-10-10-10-2. 91.1%. • MLP3: α = 0.8, µ = 0, MLP Config 5-10-10-10-2. 92.5%. • MLP4: α = 0.1, µ = 0, MLP Config 6-12-12-12-2. 90.2%.

Experiment Results When dealer uses 17-point rule: When player uses random moves: When both dealer and player use MLP:

Conclusion • MLP network works best for highly random and dynamic games, where the game rules and the strategies are hard to define and the game outputs are hard to predict exactly. • Strategies interpreted from Reinforcement Learning - Hit if less than 15, otherwise stand. • As the number of game increases, the game strategies will change over time.

Future work • Current hand depends on the last hands. Use card memory in Blackjack. • Train ANN with a teacher to eliminate duplicate patterns (for example, 4 + 7 = 7 + 4 = 5 + 6 = …) and identify misclassified pattern • Train ANN to play against different experts so that it can pick up various game strategies • Include game tricks and strategies in a table for the ANN to look up • Explore other learning methods

Learning BlackJack with ANN (Aritificial Neural Network)

Learning BlackJack with ANN (Aritificial Neural Network)

Presentation Transcript

Learning in Neural and Belief Networks

Learning with Neural Networks

Machine Learning: Lecture 4

Artificial Neural Network (ANN) Paradigms

ARTIFICIAL NEURAL NETWORK (ANN)

ANN 2009

Artificial Neural Network Supervised Learning

Artificial Neural Network (ANN)

ANN : An introduction

Machine Learning Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN)

ANN ON CALDET

Machine Learning

Learning Process

Lecture 3 Basic Definitions of ANN

Artificial Neural Network (ANN) Paradigms

IV. Neural Network Learning

Artificial Neural Networks (ANN)

Supervised Learning Networks

Lecture 3 Basic Definitions of ANN