1 / 21

Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth guy.haworth@bnc.oxon.org. Topics …. Motivation Reference Fallible Players E(c) In the zone of Endgame Table Zone (ETZ) Prior to the Endgame Table Zone

breck
Download Presentation

Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth guy.haworth@bnc.oxon.org ACG12 Performance and Prediction, 2009-05-11

  2. Topics …. • Motivation • Reference Fallible Players E(c) • In the zone of Endgame Table Zone (ETZ) • Prior to the Endgame Table Zone • A set of hypotheses {Hk} about engines {E(ck)} • Bayesian Inference, given a choice of hypotheses, and evidence: • Prior belief, posterior belief, Prob [Hk] • Translating the 'Reference Player' idea to the pre-EZT • Results … differentiation, value of small samples, ACG12 Performance and Prediction, 2009-05-11

  3. Motivation • Assess decision makers when they are under pressure • Need a Utopian Decision Maker, a Reference Agent (RA) • Finite set of choices, each with some Utility Value • A 'model world' is used to define the Utility Value • RA always makes the choice with the best Utility Value • RA is then deskilled to make Reference Fallible Agents (RFAs) • RFA does not always make the best choice • {RFA}  the Space of Reference Fallible Agents (SRFA) • Now we take a human decision maker H … and associate them with some profile in SRFA … by hypothesising that they are one of the RFAs and … weighing the evidence to decide how likely each RFA actually is ACG12 Performance and Prediction, 2009-05-11

  4. 1.Kc2, Kc1 or Ka1? Mate in d = 23 with 1. Kc2 Mate in d = 24 with 1. Kc1 Mate in d = 29 with 1. Ka1 A Chess Engine E chooses 1. Kc2 A stochastic version of E may not Let E(c) be a stochastic engine: Likelihood[E(c) moves to p, depth d] = (1 + d)-c  Prob[E(c), p] c = 0: all moves equally likely c = : only best moves played ACG12 Performance and Prediction, 2009-05-11

  5. (d = #23) Kc2, (24) Kc1, (#29) Ka1 Ka1 Ka1 Ka1 Kc1 Kc2 Kc2 Kc2 Kc1 Kc1 c = 0 c = 5 c = 20 ACG12 Performance and Prediction, 2009-05-11

  6. Which Engine … is playing the moves? • Suppose you see a sequence of player P's moves in KQKR • You are told that they are being played by some engine E(c) • You are told that it is one of E(0), E(5) or E(20) • Which agent, A, is it: E(0), E(5) or E(20)? • What would be fair odds? • If you 'know nothing' (as you do) at the beginning … Prob[A = E(i)] = 1/3 • Let's suppose you see a sequence of optimal moves • You should start to lean away from E(0) and towards E(20) • But what are the probabilities now? No need to guess … Bayes' Rule tells you exactly what the new probabilities are ACG12 Performance and Prediction, 2009-05-11

  7. Bayes' Rule • Probability [Hypothesis | Evidence]  Prob [Hypothesis]  Prob [Evidence | Hypothesis] • We have a choice of three hypotheses: H0 "E = E(0)", H5 "E = E(5)", H20"E = E(20)" • Prob[Hi] = 1/3 = 0.33 = the prior probability, i.e.before Kc2 is seen • Prob[E(0), Kc2] = 1/3 = 0.33 • Prob[E(5), Kc2] = 0.47; Prob[E(20), Kc2] = 0.70 • Prob[H0 | Kc2]  0.33  0.33 = 0.11 … etc (0.16, 0.23 … sum 0.50) • Scaling … Prob[H0 | Kc2] = 0.22 = the posterior probability Prob[H5 | Kc2] = 0.31 and Prob[H20 | Kc2] = 0.47 ACG12 Performance and Prediction, 2009-05-11

  8. The effect of Prior Probabilities • In the example above, the posterior probability of H = Prob[Ev|H] • This is because the prior probability of H was 1/3 for all H • So the application of Bayes' Rule has been somewhat obscured • Suppose the priors were H00.2, H50.3, H20 0.5 • Then the posterior probabilities are proportional to: • H0: 0.2  0.33 = 0.066 • H5: 0.3  0.47 = 0.141 • H20: 0.5  0.70 = 0.350 … totalling 0.557 so we scale up to … • Prob[H0] = 0.066/0.557 = 0.12 • Prob[H5] = 0.141/0.557 = 0.25; Prob[H5] = 0.350/0.557 = 0.63 • So, new posteriors were 0.22/0.31/0.47 … now 0.12/0.25/0.63 ACG12 Performance and Prediction, 2009-05-11

  9. Rev. Bayes, Transform, Aeolian Harp Bayesian Inference PA: P  A c1 c2 0 PA:PA c1 c2 Refine model parameters 'Model Error' ║EPP – EPA║ "Let the Wind of Evidence blow through the Aeolian Harp of your Hypotheses" ACG12 Performance and Prediction, 2009-05-11

  10. Chess Engines as Benchmarks? • Engines are improving all the time: hw, algorithms, knowledge • There is actually a danger that they may become too good • They are not infallible: 'best moves' are not necessarily best • q.v. changes of mind from one search depth to the next • However, greater depth of search  better engine [Beal] • Benchmark fallibility contributes statistical uncertainty to findings • Independence is also required: engine E cannot vote itself 'best'! ACG12 Performance and Prediction, 2009-05-11

  11. Using the idea on pre-EGT Chess • Idea is to use chess-engine evaluations {vi} rather than depths • Announced in 'Chess Endgame News', ICGA J. 28-4, 243 (2005) • However, this brings some complications: • Some evaluations, unlike Depths to Mate, are negative • The evaluations vi are evaluated using heuristics • Chess-engines' preferences are not infallible • Engines' preferences may vary engine-to-engine, depth-to-depth • Some intuitive observations: • A panel of engines is better than one engine as a benchmark • The better the engine and the greater the depth [Beal], the better • Uncertainty is halved by using four times the data ACG12 Performance and Prediction, 2009-05-11

  12. game 1 game 2 game k Performance v Skill Rating … skill rating vs. performance rating (Elo) ACG12 Performance and Prediction, 2009-05-11

  13. Stochastic choice, given position evaluations • At position p, some move mi to positions pihave evals vi • Can we say Likelihood[E(c), mi] = (1 + vi)c • No, because some vimay be negative • Need a mapping v  w, s.t. i, wi  0 and v1> v2 w1 < w2 • Some functions w = C(v) are better than others! • The intuitively obvious wi = 1 + |v1| + (v1 – vi) is not ideal • The wiare analogous to the ditaken from an Endgame Table • Currently using wi = +(v1 – vi) with  >0 … in fact  = 0.1 • Model choices, yet to be tested as to effect • Choice of specific engine and search-depth • Choice of  • Mapping to, e.g., r1.E() + r2.E(c) rather than to one engine E(c) ACG12 Performance and Prediction, 2009-05-11

  14. Results … based on TOGA II (depth 10) • Measured: Performance against Kaissa rather than opponent' • An absolute rather than relative measure, given the benchmark • A measure not affected by the opponent's performance • m data points per game, rather than one (the game result) • Spectroscope! Virtual players at different ELO can be differentiated • Higher ELO  higher apparent competencec • Winners visibly relax and play for the result when closing out • Best performance-indicators are drawn games between like players • Performance, pre- and post-ELO, assessed in same terms • Epochs of performance compared, pre- and post- cheating accusations • Games tracked in 2D not 1D (net advantage); games compared • Absolute performance of ELO 2400 players tracked across time ACG12 Performance and Prediction, 2009-05-11

  15. Virtual ELO Players: Data ACG12 Performance and Prediction, 2009-05-11

  16. Profile of Virtual ELO Players in c-space 2300 2400 2600 2100 2700 ACG12 Performance and Prediction, 2009-05-11

  17. Keres -v- The Rest (1948); D.P.Singh (2006) WCC (1948): Keres 0 Botvinnik 4 D.P.Singh v Opponents ACG12 Performance and Prediction, 2009-05-11

  18. D.P.Singh – 'before' and 'after' Two 6-month periods Not as conclusive as it appears 'c'-tracking across the whole period ACG12 Performance and Prediction, 2009-05-11

  19. Allwermann-Kalinitschew (1998) Variation in c for both players; Track locus of game in 2D Standard 1-dimensional charting of the game ACG12 Performance and Prediction, 2009-05-11

  20. Summary • 'Contextual Analysis' (CA) of the individual player's decisions • 'Decision Matching' (DM) uses less information and is cruder • 'Average Differencing' (AD) uses less information: ditto • CA successfully differentiates players of different ELOs • the standard deviation on c gives an idea of differentiator-power • expect CA to be a better differentiator than AD … • and expect AD to be better than DM • CA, using Bayesian Analysis, applied to: • Career and epoch analysis, tournament and game analysis • Future directions: • Evolving the method, including deeper statistical treatment • Applying it to other chess and non-chess scenarios ACG12 Performance and Prediction, 2009-05-11

  21. Spare ACG12 Performance and Prediction, 2009-05-11

More Related