440 likes | 456 Views
Yingcai Xiao. Yingcai Xiao. Data Analytics & Decision Making (DADM) in Gaming. Analyze data to draw conclusions for decision making. http:// searchdatamanagement.techtarget.com/definition/data-analytics https://www.google.com/? gws_rd=ssl#q=data+analytics. Data Analytics.
E N D
Yingcai Xiao • Yingcai Xiao Data Analytics & Decision Making (DADM) in Gaming
Analyze data to draw conclusions for decision making.http://searchdatamanagement.techtarget.com/definition/data-analyticshttps://www.google.com/?gws_rd=ssl#q=data+analytics Data Analytics
Select a course of action among several alternative options. https://en.wikipedia.org/wiki/Decision-makingDecision making process:1. Figure out your options (possible moves).2. Evaluate the pros and cons of each option/move.3. Design a strategy based on the evaluation.4. Take the actions that follow the strategy to achieve the optimal or suboptimal result. A prisoner-hat example from ed.ted.edu Decision Making https://m.youtube.com/watch?feature=youtu.be&v=N5vJSNXPEwA https://en.m.wikipedia.org/wiki/Prisoners_and_hats_puzzle
DADM Applications • Gaming: AlphaGo • Health: Epidemiology • Business: https://en.wikipedia.org/wiki/Business_analytics • Accounting: https://www.coursera.org/learn/accounting-analytics • Game Theory: http://www.gametheory.net/ • Research: IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES
AlphaGo • AlphaGo: a game engine that plays the Go game. • Go: aChinese board game of strategies. 19x19, 361!, 10761 (total number of fundamental particles in the observable universe:1085). • The first computer program to beat a professional Go player (4:1, March 8-15, 2016).
AlphaGo Algorithms • Based on tree searches and neural networks that can learn. • Policy Network: only consider a few promising positions (limit the breadth of the search tree.) • Value Network: only consider a few steps deep (limit the depth of the search tree).
Statistical Searching Algorithms https://en.wikipedia.org/wiki/Monte_Carlo_tree_search “The focus of Monte Carlo tree search (MCTS) is on the analysis of the most promising moves, expanding the search tree based on random sampling of the search space. The application of Monte Carlo tree search in games is based on many playouts. In each playout, the game is played-out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts” https://en.wikipedia.org/wiki/Monte_Carlo_method
MCTS • “Four Steps: • Selection: start from root R and select successive child nodes down to a leaf node L. The section below says more about a way of choosing child nodes that lets the game tree expand towards most promising moves, which is the essence of Monte Carlo tree search. • Expansion: unless L ends the game with a win/loss for either player, either create one or more child nodes or choose from them node C. • Simulation: play a random playout from node C. • Backpropagation: use the result of the playout to update information in the nodes on the path from C to R.”
MCTS https://commons.wikimedia.org/wiki/File:MCTS_(English).svg#/media/File:MCTS_(English).svg
MCTS Applications • Games: Go, … • DeepMind: the engine behind AlphaGo are being applied to smartphone assistants, healthcare, and robotics. • Polymer Science: a polymer is composed of hundreds of atoms and the number of ways those atoms can be organized (connected) in 3D is huge. MCTS can help to reduce the search tree for finding polymer structures with better properties (e.g., better strength, elasticity, …) • Procedural content generation (PCG): creative application in game design, linguistics, art and music. • Optimization, scheduling, production management, … • http://www.cameronius.com/cv/mcts-survey-master.pdf
Quasi Globally Optimal Solutions “Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space” “for problems where finding the precise global optimum is less important than finding an acceptable local optimum in a fixed amount of time, simulated annealing may be preferable to alternatives such as brute-force search or gradient descent.” https://en.wikipedia.org/wiki/Simulated_annealing
Globally Optimal Solutions • Let s = s0 • For k = 0 through kmax (exclusive): • T ← temperature(k ∕ kmax) • Pick a random neighbor, snew ← neighbor(s) • If P(E(s), E(snew), T) ≥ random(0, 1), move to the new state: s ← snew • Output: the final state s • https://en.wikipedia.org/wiki/Simulated_annealing
Exploratory Data Analysis (EDA) • Explore the main characteristics of data. • Usually visual. • Examples: • Minard’s map • https://en.wikipedia.org/wiki/Exploratory_data_analysis#/media/File:Minard%27s_Map_%28vectorized%29.svg • Scatter plots • https://en.wikipedia.org/wiki/Scatter_plot • Parallel Coordinates • https://syntagmatic.github.io/parallel-coordinates/ • http://www.xdat.org/
Data Analysis viaStatistics Inference(deducing properties of data / populationby statistical means)
Statistical Courses at UA • Applied Statistics (3470:461) • http://www.uakron.edu/dotAsset/23d5548a-e31d-411d-b840-a0c9bf78bf97.pdf • Probability & Statistics for Engineers (3470:401) • http://www.uakron.edu/dotAsset/97c92137-7597-464e-a560-40b70c33632f.pdf • Practical Statistics by Buglear • http://proquest.safaribooksonline.com/book/statistics/9780749468460
Population Mean, Standard Deviation • Mean: • arithmetic mean / expected value: E(X); • https://en.wikipedia.org/wiki/Mean • Standard Deviation (SD) • sqrt(E( (X – E(X))2)) • the amount of variation of data values • https://en.wikipedia.org/wiki/Standard_deviation • (figure) • Degree of Freedom (number of independent variables) • https://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29
Sample Mean, Standard Error (SE) • Sample data is a subset of the population to be used to represent the population in statistical inference. • Sample Mean is the mean of the samples (not the complete population.) • Standard Error (SE) is the standard deviation of the sample distribution of a statistic. • https://en.wikipedia.org/wiki/Standard_error
Statistical Hypothesis (H) • An assumption made about the statistical property of the data. • Statistical hypothesis test is a method of statistical inference. • Null Hypothesis Significance Testing (NHST) • https://en.wikipedia.org/wiki/Null_hypothesis
Null Hypothesis (H0) • A hypothesis that the researcher wants to disapprove / nullify (H0). • Usually “opposite” to what the researcher believes. • The strategy is usually to find evidence to disprove (nullify) H0. • Disproval is easier than approval. • But disproval may not approve the “opposite” is true. • Example: H0 the shadows in the evenings are wolfs. • https://en.wikipedia.org/wiki/Null_hypothesis
Confidence Interval • https://en.wikipedia.org/wiki/Confidence_interval • “Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population parameter”. • "we are 95% confident (we have a 95% confidence level) that the true value of the parameter is in our confidence interval”. • a 95% confidence interval reflects a 95% confidence level for data to be in the interval. • a 95% confidence level is a 5% significance level.
Significance Level • significance level / alpha level (α), the probability of rejecting the null hypothesis when the null hypothesis is true. Usually 5% (0.05) or 1% (0.01). • p-value, the probability of observing the cases in which the null hypothesis is true. • The null hypothesis is rejected if the p-value is less than the significance or α level.
The p-value • https://en.wikipedia.org/wiki/P-value • the definition • the dice and coin examples • http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics • http://www.dummies.com/how-to/content/what-a-pvalue-tells-you-about-statistical-data.html
Errors • Type I error (error of the first kind, false positive) occurs when the null hypothesis (H0) is true, but is rejected (p < α). • A type II error (error of the second kind, false negative)occurs when the null hypothesis (H0) is false, but erroneously fails to be rejected (p >= α).
a tail or two • significance level / alpha level (α) needs to be distributed to both tails for two-tail studies. • http://www.ats.ucla.edu/stat/mult_pkg/faq/general/tail_tests.htm
Rules of Interpreting Hypothesis Test Statistics • http://blog.minitab.com/blog/michelle-paret/alphas-p-values-confidence-intervals-oh-my • Confidence level + alpha = 1 • If the p-value is low, the null must go. • The confidence interval and p-value will always lead you to the same conclusion.
The T-test • http://www.socialresearchmethods.net/kb/stat_t.php • The t-test assesses whether the means of two groups are statistically different from each other. • The formula for the t-test is a ratio: • the difference between the two means over • a measure of dispersion • (e.g., standard error of the difference).
The T-statistics • https://en.wikipedia.org/wiki/T-statistic • a t-statistic is the ratio of the difference (departing from the truth) and the standard error (spread of the departation). • http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics
Student’s t-test • a method of testing hypotheses about the mean of a small sample drawn from a normally distributed population when the population standard deviation is unknown. • http://www.britannica.com/science/Students-t-test • Student: the pen name of William Sealy Gosset
Student’s t-test • “The t distribution is a family of curves in which the number of degrees of freedom (the number of independent observations in the sample minus one) specifies a particular curve. As the sample size (and thus the degrees of freedom) increases, the t distribution approaches the bell shape of the standard normal distribution. In practice, for tests involving the mean of a sample of size greater than 30, the normal distribution is usually applied.” • http://www.britannica.com/science/Students-t-test
Cluster Analysis • Cluster analysis / clustering: group data points of similar properties • https://en.wikipedia.org/wiki/Cluster_analysis • By distance in a property space. • Widely used in machine learning, pattern recognition, image analysis, information retrieval, bioinformatics.
EDA & VDA • Explore Data Analytics (EDA) • Visual Data Analytics (VDA) • Hand Drawing: Minard’s map • Computer Plots: Scatter plots, • Interactive Computer Plots: Xdat • UA course: 3460:658 Visualization • National Center for Visualization and Visual Analytics https://www.dhs.gov/sites/default/files/publications/Center%20for%20Visualization%20and%20Data%20Analytics-CVADA.pdf
SA & SI • Statistical Analysis: • Statistical Inference (SI) • Bayesian inference uses Bayes' theoremto update the probability for a hypothesis dynamically. • Regression analysis: statistically estimating the relationships among variables.
DM, ML, AI • Data Mining (DM) • DM discovers patterns in data using computers • UA course: 3460:676 Data Mining • Machine learning: learning by a machine, pattern recognition, AI, 3460:560 • Deep learning: uses artificial neural networks with multiple hidden layers
Data Structures for Data Analytics (DA) • Array/LL/SQ/Tree • Database/Tables/SQL • Web: XML/SOAP/WSDL/RESTful • Semantic Web: RDF (Resource Description Framework) • Comparisons with ER model.
Select a course of action among several alternative options. https://en.wikipedia.org/wiki/Decision-makingDecision making process:1. Figure out your options (possible moves).2. Evaluate the pros and cons of each option/move.3. Design a strategy based on the evaluation.4. Take the actions that follow the strategy to achieve the optimal or suboptimal result. A prisoner-hat example from ed.ted.edu Decision Making https://m.youtube.com/watch?feature=youtu.be&v=N5vJSNXPEwA https://en.m.wikipedia.org/wiki/Prisoners_and_hats_puzzle
John von Neumann (The Martians) Computer Architecture https://en.wikipedia.org/wiki/Von_Neumann_architecture Game Theory https://en.wikipedia.org/wiki/Von_Neumann_architecture https://en.wikipedia.org/wiki/Ergodic_theory
John von Neumann (The Martians) The concept of creating a propositional calculus for quantum logic was first outlined in a short section in von Neumann's 1932 work, but in 1936, the need for the new propositional calculus was demonstrated through several proofs. For example, photons cannot pass through two successive filters that are polarized perpendicularly (e.g., one horizontally and the other vertically), and therefore, a fortiori, it cannot pass if a third filter polarized diagonally is added to the other two, either before or after them in the succession, but if the third filter is added in between the other two, the photons will, indeed, pass through. This experimental fact is translatable into logic as the non-commutativity of conjunction
DA for the Webhttp://www.theregister.co.uk/2007/06/02/data_analysis_2-0/ Data Structures
DA for the Webhttp://www.theregister.co.uk/2007/06/02/data_analysis_2-0/ Data Structures https://www.codechef.com/wiki/tutorial-dynamic-programming http://www.thelearningpoint.net/computer-science/dynamic-programming https://en.wikipedia.org/wiki/Dynamic_programming?wprov=sfla1 https://www.topcoder.com/community/data-science/data-science-tutorials/dynamic-programming-from-novice-to-advanced/ http://www.geeksforgeeks.org/bitmasking-and-dynamic-programming-set-1-count-ways-to-assign-unique-cap-to-every-person/ http://codercareer.blogspot.com/p/dynamic-interview-questions.html?m=1 CoderCareer: Discussing Coding Interview Questions from Google,Amazon, Facebook, Microsoft, etc https://en.wikipedia.org/wiki/Cooperative_game http://www.google.com/url?sa=t&source=web&cd=3&rct=j&q=the%20hat%20problem&ved=0ahUKEwisvtzSsOPLAhWmzoMKHWVGA4IQFggkMAI&url=https%3A%2F%2Fen.m.wikipedia.org%2Fwiki%2FPrisoners_and_hats_puzzle&usg=AFQjCNGbgTfEZsX6Ox1AbrpujBrkx8w1EQ&sig2=RJ_DV0tWn0vQAd9PRNQe1Q
https://www.google.com/?gws_rd=ssl#q=data+analyticshttps://en.wikipedia.org/wiki/Exploratory_data_analysishttp://www.socialresearchmethods.net/kb/statinf.phphttps://en.wikipedia.org/wiki/Statistical_hypothesis_testinghttps://en.wikipedia.org/wiki/Null_hypothesishttps://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Type_I_errorhttp://www.dummies.com/how-to/content/what-a-pvalue-tells-you-about-statistical-data.htmlhttps://en.wikipedia.org/wiki/Cohort_studyhttp://www.socialresearchmethods.net/kb/stat_t.phphttps://en.wikipedia.org/wiki/Sample_mean_and_covariancehttp://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statisticshttps://en.wikipedia.org/wiki/Student's_t-distributionhttps://en.wikipedia.org/wiki/John_von_Neumann#Early_life_and_educationWhat are the differences between one-tailed and two-tailed tests?http://www.ats.ucla.edu/stat/mult_pkg/faq/general/tail_tests.htmhttps://cloud.google.com/https://cloud.google.com/products/machine-learning/https://cloud.google.com/vision/https://cloudwebinars.withgoogle.com/live/next-live/?utm_source=cloud.google.com&utm_medium=google&utm_content=homepage&utm_campaign=2016-cloud-na-event-next-userconf-web-hpp-cgc&utm_term=outbound Resources