180 likes | 323 Views
How to predict the winner of a match before it begins How to pay for college through E-sport gambling. Starcraft Oracle. An SVM learner for Starcraft II matches. Andy Wang Devin Ekins Josh Belcher. Quick Starcraft 2 Overview. Real Time Strategy Has an unbelievably rich strategic system.
E N D
How to predict the winner of a match before it begins How to pay for college through E-sport gambling Starcraft Oracle An SVM learner for Starcraft II matches Andy Wang Devin Ekins Josh Belcher
Quick Starcraft 2 Overview • Real Time Strategy • Has an unbelievably rich strategic system. • Ability to micromanage soldiers rapidly is also critical. • The Basics • Gather resources with workers to build up a base and army. • Scout enemy players’ positions, harass enemy workers, engage in epic battles. • Three distinct races with vastly different playstyles: Terrans (Humans), Zerg, and Protoss • Win by destroying all enemy bases and soldiers.
The Problem • What is the problem? • Predict outcome of Starcraft II match given previous player history. • Identify players through strategies, other metrics. • Previous Work • sc2reader : Replay parser • ggtracker : Gigantic database full of Starcraft replay data from multiple competitions • Gross estimations of player skill based solely on Actions per Minute (APM).
So What? • Turns out it is actually a big deal • This October, over $400,000 were awarded in prizes during SC2 tournaments. • Several million viewers watch the larger tournaments. • People bet on who will win matches...
Data Gathering (Further Domain Reduction) • Player selection (~200 players) • Most Played Race: Terran • Matches Played Count > 400 • Match selection (~2000 matches) • Terran vs Terran • Ranked (Ladder) matches • 77 matches were played between our 200 selected players
Methodology - Overview • Supervised learning using SVM approach. • Feature vectors derived from game data extracted using the sc2reader tool. • Players already implicitly classified by Blizzard’s ranking (“Ladder” matches.)
Methodology -- Workflow • Domain of Learner is somewhat different than domain of Predictor. • Thus, we expect that we will lose some accuracy.
Results of match-based learner • Each run was: • Using a linear kernel -- very fast; limited uses of other kernels have not been fruitful. • Performed using 95% of the data for training, with the remaining 5% for testing the classification. • Repeated 65 times to get a 95% confidence interval over ±1% of the measured accuracy.
Results -- Industry Baseline • Feature Vector 1 • Actions Per Minute (APM) • Classification Accuracy = 57.2%
Results -- Educated Guess • Feature Vector 2 • APM, Average Workers, Time “Food Capped” • Classification Accuracy = 69.3%
Results -- Encoding Strategies • Feature Vector 3 • Everything before + “Strategies” • Strategies based mostly on unit composition • Strategy detection worthy of a classification project of its own! • Classification Accuracy = 70.8%
The Interesting Part • Predicting victors without game data • This is what would be actually used by the Starcraft 2 E-sporting industry. • Requires either a massive amount of data to build a robust history-based learner... • ...or the ability to use a learner built from individual matches on players. • No one (excluding Blizzard) has enough data to create the history-based learner.
The Hard Part • Booleans vs. Floats • Strategies are encoded as booleans for matches. • But they are proportions for players. • Not sure how to reconcile this. • Size of feature vector close to size of data • We encode 35 distinct strategies + 5 other features. • We only have 77 matches to test on. • Accuracy loss • If our match-based learner has only 70% accuracy, we can not afford to lose much more and stay meaningful in our player-based evaluator.
The Predictor • Accuracy • Roughly 56% Accuracy -- Ouch! • Sample size too small to be confident that this is better than guessing. • However, we have only recently reached this stage. • Ideas for improvement • Different kernels • Slightly more data • Additional features • Combinatorial approach to solve our boolean to float problem