210 likes | 349 Views
Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth guy.haworth@bnc.oxon.org. Topics …. Motivation Reference Fallible Players E(c) In the zone of Endgame Table Zone (ETZ) Prior to the Endgame Table Zone
E N D
Performance and Prediction: Bayesian Modelling of Fallible Choice in Chess Guy Haworth guy.haworth@bnc.oxon.org ACG12 Performance and Prediction, 2009-05-11
Topics …. • Motivation • Reference Fallible Players E(c) • In the zone of Endgame Table Zone (ETZ) • Prior to the Endgame Table Zone • A set of hypotheses {Hk} about engines {E(ck)} • Bayesian Inference, given a choice of hypotheses, and evidence: • Prior belief, posterior belief, Prob [Hk] • Translating the 'Reference Player' idea to the pre-EZT • Results … differentiation, value of small samples, ACG12 Performance and Prediction, 2009-05-11
Motivation • Assess decision makers when they are under pressure • Need a Utopian Decision Maker, a Reference Agent (RA) • Finite set of choices, each with some Utility Value • A 'model world' is used to define the Utility Value • RA always makes the choice with the best Utility Value • RA is then deskilled to make Reference Fallible Agents (RFAs) • RFA does not always make the best choice • {RFA} the Space of Reference Fallible Agents (SRFA) • Now we take a human decision maker H … and associate them with some profile in SRFA … by hypothesising that they are one of the RFAs and … weighing the evidence to decide how likely each RFA actually is ACG12 Performance and Prediction, 2009-05-11
1.Kc2, Kc1 or Ka1? Mate in d = 23 with 1. Kc2 Mate in d = 24 with 1. Kc1 Mate in d = 29 with 1. Ka1 A Chess Engine E chooses 1. Kc2 A stochastic version of E may not Let E(c) be a stochastic engine: Likelihood[E(c) moves to p, depth d] = (1 + d)-c Prob[E(c), p] c = 0: all moves equally likely c = : only best moves played ACG12 Performance and Prediction, 2009-05-11
(d = #23) Kc2, (24) Kc1, (#29) Ka1 Ka1 Ka1 Ka1 Kc1 Kc2 Kc2 Kc2 Kc1 Kc1 c = 0 c = 5 c = 20 ACG12 Performance and Prediction, 2009-05-11
Which Engine … is playing the moves? • Suppose you see a sequence of player P's moves in KQKR • You are told that they are being played by some engine E(c) • You are told that it is one of E(0), E(5) or E(20) • Which agent, A, is it: E(0), E(5) or E(20)? • What would be fair odds? • If you 'know nothing' (as you do) at the beginning … Prob[A = E(i)] = 1/3 • Let's suppose you see a sequence of optimal moves • You should start to lean away from E(0) and towards E(20) • But what are the probabilities now? No need to guess … Bayes' Rule tells you exactly what the new probabilities are ACG12 Performance and Prediction, 2009-05-11
Bayes' Rule • Probability [Hypothesis | Evidence] Prob [Hypothesis] Prob [Evidence | Hypothesis] • We have a choice of three hypotheses: H0 "E = E(0)", H5 "E = E(5)", H20"E = E(20)" • Prob[Hi] = 1/3 = 0.33 = the prior probability, i.e.before Kc2 is seen • Prob[E(0), Kc2] = 1/3 = 0.33 • Prob[E(5), Kc2] = 0.47; Prob[E(20), Kc2] = 0.70 • Prob[H0 | Kc2] 0.33 0.33 = 0.11 … etc (0.16, 0.23 … sum 0.50) • Scaling … Prob[H0 | Kc2] = 0.22 = the posterior probability Prob[H5 | Kc2] = 0.31 and Prob[H20 | Kc2] = 0.47 ACG12 Performance and Prediction, 2009-05-11
The effect of Prior Probabilities • In the example above, the posterior probability of H = Prob[Ev|H] • This is because the prior probability of H was 1/3 for all H • So the application of Bayes' Rule has been somewhat obscured • Suppose the priors were H00.2, H50.3, H20 0.5 • Then the posterior probabilities are proportional to: • H0: 0.2 0.33 = 0.066 • H5: 0.3 0.47 = 0.141 • H20: 0.5 0.70 = 0.350 … totalling 0.557 so we scale up to … • Prob[H0] = 0.066/0.557 = 0.12 • Prob[H5] = 0.141/0.557 = 0.25; Prob[H5] = 0.350/0.557 = 0.63 • So, new posteriors were 0.22/0.31/0.47 … now 0.12/0.25/0.63 ACG12 Performance and Prediction, 2009-05-11
Rev. Bayes, Transform, Aeolian Harp Bayesian Inference PA: P A c1 c2 0 PA:PA c1 c2 Refine model parameters 'Model Error' ║EPP – EPA║ "Let the Wind of Evidence blow through the Aeolian Harp of your Hypotheses" ACG12 Performance and Prediction, 2009-05-11
Chess Engines as Benchmarks? • Engines are improving all the time: hw, algorithms, knowledge • There is actually a danger that they may become too good • They are not infallible: 'best moves' are not necessarily best • q.v. changes of mind from one search depth to the next • However, greater depth of search better engine [Beal] • Benchmark fallibility contributes statistical uncertainty to findings • Independence is also required: engine E cannot vote itself 'best'! ACG12 Performance and Prediction, 2009-05-11
Using the idea on pre-EGT Chess • Idea is to use chess-engine evaluations {vi} rather than depths • Announced in 'Chess Endgame News', ICGA J. 28-4, 243 (2005) • However, this brings some complications: • Some evaluations, unlike Depths to Mate, are negative • The evaluations vi are evaluated using heuristics • Chess-engines' preferences are not infallible • Engines' preferences may vary engine-to-engine, depth-to-depth • Some intuitive observations: • A panel of engines is better than one engine as a benchmark • The better the engine and the greater the depth [Beal], the better • Uncertainty is halved by using four times the data ACG12 Performance and Prediction, 2009-05-11
game 1 game 2 game k Performance v Skill Rating … skill rating vs. performance rating (Elo) ACG12 Performance and Prediction, 2009-05-11
Stochastic choice, given position evaluations • At position p, some move mi to positions pihave evals vi • Can we say Likelihood[E(c), mi] = (1 + vi)c • No, because some vimay be negative • Need a mapping v w, s.t. i, wi 0 and v1> v2 w1 < w2 • Some functions w = C(v) are better than others! • The intuitively obvious wi = 1 + |v1| + (v1 – vi) is not ideal • The wiare analogous to the ditaken from an Endgame Table • Currently using wi = +(v1 – vi) with >0 … in fact = 0.1 • Model choices, yet to be tested as to effect • Choice of specific engine and search-depth • Choice of • Mapping to, e.g., r1.E() + r2.E(c) rather than to one engine E(c) ACG12 Performance and Prediction, 2009-05-11
Results … based on TOGA II (depth 10) • Measured: Performance against Kaissa rather than opponent' • An absolute rather than relative measure, given the benchmark • A measure not affected by the opponent's performance • m data points per game, rather than one (the game result) • Spectroscope! Virtual players at different ELO can be differentiated • Higher ELO higher apparent competencec • Winners visibly relax and play for the result when closing out • Best performance-indicators are drawn games between like players • Performance, pre- and post-ELO, assessed in same terms • Epochs of performance compared, pre- and post- cheating accusations • Games tracked in 2D not 1D (net advantage); games compared • Absolute performance of ELO 2400 players tracked across time ACG12 Performance and Prediction, 2009-05-11
Virtual ELO Players: Data ACG12 Performance and Prediction, 2009-05-11
Profile of Virtual ELO Players in c-space 2300 2400 2600 2100 2700 ACG12 Performance and Prediction, 2009-05-11
Keres -v- The Rest (1948); D.P.Singh (2006) WCC (1948): Keres 0 Botvinnik 4 D.P.Singh v Opponents ACG12 Performance and Prediction, 2009-05-11
D.P.Singh – 'before' and 'after' Two 6-month periods Not as conclusive as it appears 'c'-tracking across the whole period ACG12 Performance and Prediction, 2009-05-11
Allwermann-Kalinitschew (1998) Variation in c for both players; Track locus of game in 2D Standard 1-dimensional charting of the game ACG12 Performance and Prediction, 2009-05-11
Summary • 'Contextual Analysis' (CA) of the individual player's decisions • 'Decision Matching' (DM) uses less information and is cruder • 'Average Differencing' (AD) uses less information: ditto • CA successfully differentiates players of different ELOs • the standard deviation on c gives an idea of differentiator-power • expect CA to be a better differentiator than AD … • and expect AD to be better than DM • CA, using Bayesian Analysis, applied to: • Career and epoch analysis, tournament and game analysis • Future directions: • Evolving the method, including deeper statistical treatment • Applying it to other chess and non-chess scenarios ACG12 Performance and Prediction, 2009-05-11
Spare ACG12 Performance and Prediction, 2009-05-11