200 likes | 335 Views
Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings ++ 1999 …). Presented by Brett Borghetti 7 Jan 2007. Contributions of the work:. New betting strategy using probability: Propagate a “Probability Triple” knowledge representation <P(fold),P(call),P(raise)>
E N D
Using Probabilistic Knowledge And Simulation To Play Poker(Darse Billings++ 1999 …) Presented by Brett Borghetti 7 Jan 2007
Contributions of the work: • New betting strategy using probability: • Propagate a “Probability Triple” knowledge representation • <P(fold),P(call),P(raise)> • An atomic unit stating the likelihood of each action occurring under a given situation • Uses real-time simulations to generate a selective sample of the possible outcomes while a hand is in progress. Brett Borghetti
Old Loki (Loki-1) • Only carried most likely action or probability of playing the hand. • Uses “expert knowledge” • Initial tables of income rates • Initial weight probabilities of opponent hands (how likely will they play with these cards) • Re-weighting rules for opponent model updates • Hand evaluator strength and potential • Rule based Betting module Brett Borghetti
New Loki (Loki-2) • All tables store probability triples • Propagating distributions allows distributed decisionmaking in components • Simulator calculates expected value of the selected sample of the way the hand might play out • Eliminates some of the required ‘expert knowledge’ Brett Borghetti
Probability Triples • Stores probability of 3 actions • [f,c,r] such that f+c+r = 1.0 • Used in 3 locations in Loki 2 • Triple Generator • Evaluates 2 card hands in the current context • Opponent Modeler • For updating the weight tables in the opponent model • Action Selector (for choosing our next action) • Can adjust the selection based on desired play style Brett Borghetti
Simulation-Based Betting Strategy • Calculates approximate expected value of the return on investment (expected value) for each possible betting action. • Since folding has EV=0, they only consider the actions of call or raise from current position and try to expand the game tree from there • Since the entire game tree would be intractable to search, uses selective sampling • Simulated opponent actions are biased by their weight tables, using random number to select the actual action in that simulated hand • Author claims this approach should be better than the static approach • [brett] that would depend on how accurate the weighting scheme was at detecting the true behavior of the opponent Brett Borghetti
Comparing Performance • Single measurement: Small Bets per Hand • If you play 30 hands and it is a $10/20 game, an improvement of +0.10sb/hand means you win an extra $1.00 per game which results in an extra $30 won. Brett Borghetti
Experiments • Examines each change from Loki-1 separately • R: changing the reweighting • B: changing from rules-based betting to ‘action selector’ with randomizing • S: incorporating the simulation to compute EV in the action selection decision Brett Borghetti
Experiments, (continued) • Self Play in 10-seat game: Added components one at a time and compared performance • B~R, B<<S, R<<S • B alone vs R alone is roughly equivalent and provides the least improvement, with S alone providing the most improvement • B+R+S > S Brett Borghetti
Experiments, (continued) • Player Type comparisons in 10-seat game • Number of hands played to the flop: • T = Tight • L = Loose • How frequent bet and raise after the flop • A = Agressive • C = Conservative Brett Borghetti
Issues [Brett] • At the core of Loki-2 is the weighting system that models the opponent. • Is this system flexible and adaptable to rapid changes in opponent strategy, or do the weights have some kind of inertia that prevents the model to incorporate changes as quick as they might happen • Do the weight updates (belief updates) make sense? Brett Borghetti
Background Information Brett Borghetti
Texas Hold’em Heads-up Limit Poker Basics • 2 Players • 4 Betting Rounds per hand • Preflop(2 hole cards), Flop(3 community cards), Turn (1cc), River (1 cc) • Action set = {fold, call(check), raise(bet)} • Up to 3 raises allowed per round • Round is over when either • When all players are even in the pot via a final call and each player has had at least one opportunity to act [go on to next round] • When one player folds [other player wins] Brett Borghetti
Requirements for a World Class Poker Player • Able to assess • Hand Strength • Hand Potential • Opponents Betting Strategy (opponent model) • Has a strong • Betting strategy • Ability to play deceptively [bluff vs. slow play*] • Ability to play unpredictably Brett Borghetti
Optimal vs Maximal play • Optimal player makes decisions based on game-theoretic probabilities without regard to specific context (opponent’s plays) • Maximal player takes into account the opponent’s sub-optimal tendencies and adjusts its play to exploit perceived weaknesses Brett Borghetti
Hand Assessment (Hand Strength = HS) • Pre-Flop HS determined from 169 equivalence classes “income rate” from 1M simulated poker hands • Flop HS determined comparing each of the 1081 possible opponent hands with ours and determining how many wins each player has Brett Borghetti
Hand Potential (HP) at the Flop • PPot1 = likelihood that our hand will improve with one card (the turn card) • PPot2 = likelihood that our hand will improve with two cards (turn and river) • NPot1 and 2 = equivalent calculations of likelihood that our opponent’s hand will get better than ours on the turn and/or river Brett Borghetti
Effective Hand Strength & Pot Odds • EHS = HSn + (1-HSn) x Ppotn • The chance that we either are ahead or could pull ahead by the end of n=1 or n=2 cards from now • Pot odds = P(win)/(Expected Return on Pot) • Example: if your chance of winning is 25%, you would call a $4 bet to win a $16 pot because your earnings are 0.25*$20 = $5 and hence you can expect to win $5 every time you pay $4 for an expected net gain of $1.00 per play. Brett Borghetti
Opponent Modeling • Uses initial weighting scheme based on original income rates on the 169 preflop card equivalency classes • Updates the weights generically on each hand based on the betting used during that hand • Updates the weights specifically based on the total betting history over all hands with this opponent • Weight updates based on mean and variance of call vs. raise vs. fold actions Brett Borghetti
Using the opponent model • Calculate a new weight for all possible starting card combos (1081) of the opponent based on initial weights, HS, EHS and opponent actions (generic and specific) • Weights for each possible hole card tuple provides an ordering over the possible hands • Usually greatly reduces the uncertainty of what hands the opponent is playing… asuming they are not playing deceptively. Brett Borghetti