100 likes | 274 Views
E.G. Tennis Match Prediction. Problem. Wish to predict outcome of championship tennis matches. Q1 What info do we have. Full history of previous matches on different courts at different times by different players. Court info. Player info. Weather etc. New players: less info.
E N D
Problem • Wish to predict outcome of championship tennis matches.
Q1 What info do we have • Full history of previous matches on different courts at different times by different players. Court info. Player info. Weather etc. • New players: less info. • Surrogate information: Assoc. Tennis Prof. Rankings (ordinal), ATP Points (numeric). • Some player pairs will never have player before.
Q2 What factors might affect things • Known • ATP Rank, Score, Age, Court type, • Particular pairing issues (A seems to always lose to B). • More recent history. • Latent • Recent injuries • Latent rank • Current form
Q3 How to model • Particular pairings matter • But too many pairings (O(n^2)) • Rank (O(n) ) easier to work with. • Use only known info for now. • One approach • Use rank and other O(n) to predict probability of outcome of any given pairing using discriminative approach. E.g. neural network • Remember symmetry in design! Remember things change with time – check results for different eras. • Then use a prediction for a particular pair as a prior for a Benoulli-Beta distribution for individual pairing. • Use historic data for this particular pairing to refine posterior Beta distribution.
Q3 How to model • Particular pairings matter • But too many pairings (O(n^2)) • Rank (O(n) ) easier to work with. • Use only known info for now. • One approach • Use rank and other O(n) to predict probability of outcome of any given pairing using discriminative approach. E.g. neural network • Remember symmetry in design! Remember things change with time – check results for different eras. • Representation matters. Represent things in the most informative way you can, without compromising too much on flexibility. • E.g. rank. Difference in rank (maybe – simple)? Both players ranks (yes probably). Represent numerically (probably not just this). Represent using temperature encoding? • Maybe. But do we want separate labels for rank 200 and rank 201? Probably not. • Maybe use numeric ranks AND temperature encoding on log scale). • Maybe work on refining rank representation before doing anything else.
Q4 What problems • No good as different matches at different times • Players change. • Need MLP output for different times. But this gives different Beta distributions. • So instead use output of neural network before going through the sigmoid. • Try to estimate bias for individual pairing. • Little data for individual pairings: need to be Bayesian. • Put Gaussian prior distribution on bias. Use approximate Bayesian methods to update bias distribution (next lectures).
Further still • Players have styles: use who beats who to provide player groupings. • See e.g. collaborative filtering. • Experts may have access to info that is hard to encode. Incorporate expert predictions into data. • At all stages: check it answers the questions you want it to.
Q5 What next • Get data. Check data. Check outliers. Check consistency: • Do things change over time. Courts change. Rules change. Etc. • Get data into the right format. How to represent ordinal data. • Build network. Add constraints, train, validate. Check assumptions. Is this actually going to work well enough – should see at this stage if it is definitely not. • Refine predictions using individual pairings. • Revalidate. Recheck assumptions. • Anything not quite right? Look carefully. Explain all observations. Know what is going on.
Q6 How to deploy • Test in the field. • Test in the field. • Refine. • Test in the field. • Test in the field. • Refine • Test in the field. • Test in the field. • Refine. • Test in the field. • Test in the field. • Refine. • Test in the field. • Test in the field. • Freeze • Test in the field x10. • Deploy (or ditch at earlier stage).