130 likes | 242 Views
Money ball data mining in Basketball. Presenter: Yuguan Li Professor: Carolina Ruiz. References:. http :// www.kelvinjiang.com/2011/06/data-mining-nba-players-most-similar-to.html Data Mining the NBA, Players most similar to Jordan June 2011
E N D
Money balldata mining in Basketball Presenter: Yuguan Li Professor: Carolina Ruiz
References: • http://www.kelvinjiang.com/2011/06/data-mining-nba-players-most-similar-to.html • Data Mining the NBA, Players most similar to Jordan June 2011 • http://www.researchgate.net/publication/261501109_The_use_of_data_mining_for_basketball_matches_outcomes_prediction • The use of data mining for basketball matches outcomes prediction D. Miljković, L. Gajić, A. Kovacevic, Z. Konjović2010 • http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.6422 • Brief Application Description Advanced Scout: Data Mining and Knowledge Discovery in NBA Data IBM T.J. Watson Research Center 1997 • http://video.mit.edu/watch/a-step-by-step-introduction-to-data-mining-for-sports-analysis-mikhail-golovnya-salford-systems-7207/ • Video from youtube 2012
Money ball From google image • A baseball movie, a data mining movie • Leading questions from the movie: • What data do we have? • What result do we want to predict? Salary comparison
Data in basketball • Large number of various game statistics available • www.nba.com • www.Hoopdata.com • Player level: regular season, playoffs and entire career • Team level: win and losses • Game level: most detailed
Data in basketball All of three figures are analysis of Jeremy Lin from hoopdata
Data mining in basketball drafting • Background knowledge: Every year, each team in NBA can draft a young player from university, the lower ranked team has a higher chance to pick first. • Data exploding era, we can access to all the stat of a player in NCAA league. • Leading question:How should a team pick? Screenshot from ncaa.com
Method one: Euclidean distance Screenshot from Reference 1 • Scenario one: The only gap between us and champion is Michael Jordan, so we need to trade a player like him. • Solution: we are disposed to all the career stats of players and MJ, we can calculate the Euclidean distance between two vectors of player stat • We can compare all categories: points, rebound, assists, steals, blocks, etc. The smaller the Euclidean distance is, the more similar to MJ.
Method two: cosine similarity Screenshot from Reference 1 • Scenario two: I want to pick a substitution for one of my aged player. • Cosine Similarity: measures how much the ratio of player’s stat differ from others. • Solution: Looks like clustering in players, we may take points and rebound as an example. Player A gets 20 pts and 10 reb per game, B gets 10 pts and 5 reb per game, they should be consider the same because the ratio is exactly the same, as the trajectories of the vectors, and hence the angular difference is zero.
Method three: Pearson correlation • two players with identical statistics would have a best fit line where all data points lie perfectly on the line. As players differ more and more, their statistical data points will drift farther away from the best fit regression line Screenshot from Reference 1
Nba data mining application: Advanced scout • Advanced scout(AS) seeks out and discovers interesting patterns in game data. With this information, a coach can assess the effectiveness of certain coaching decisions and formulate game strategies for subsequent games. • Early in 95-96 season, 16 teams already start to use AS and provided very positive feedback, “It’s like having another coach in the team”quote Bob salmi.
Data pre-processing • 1. Consistency check: detect errors(missing action/ impossible event) made during data collection • 2. Transformation:Play-sheet, which is very familiar among coaches. • 3. Enrichment: Use additional information to add value of analysis.
Data mining • AS use Attribute focusing, which is very like association rules: a event E has a series of values: {X1,X2,X3,X4,X5}, E is interesting to extent that Xi occurrence depend on Xj • Get interesting rules like this: When Steve Nash was point guard, Shawn Marion missed 0%(0) of his jump field-goal-attempts, and made 100%(4) of his jump field-goal-attempts.