360 likes | 449 Views
Football for KMS: NFL ‘01. APRIL 30 TH 2008. Abhijit Kumar Kaijia Bao Vishal Rupani. Course Instructor: Prof. Hsinchun Chen. Agenda. VISHAL. KAI. ABHI. Data Cleaning Statistical Analysis Final Paper. Data Collection Client Relations Final Presentation. Data Import
E N D
Football for KMS: NFL ‘01 APRIL 30TH 2008 Abhijit Kumar Kaijia Bao Vishal Rupani Course Instructor: Prof. Hsinchun Chen
Agenda VISHAL KAI ABHI Data Cleaning Statistical Analysis Final Paper Data Collection Client Relations Final Presentation Data Import Data Transformation Data Mining • Data Mining Techniques • Key Findings • KMS Demonstration Objectives Literature Overview Conclusion • Knowledge DiscoveryStatistical Analysis
Research Objectives • Pattern identification • Descriptive Statistics • Data Mining Techniques • Prediction • Developing a strategy • Fantasy League
Literature Overview • Moneyball: The Art of Winning an Unfair Game Michael Lewis • Las Vegas Odds www.VegasInsider.com • NFL Fantasy League www.Nfl.com/fantasy
Knowledge Discovery Process TRANSFORMATION DATA Dependent Variables Calculated Variables Independent Variables Play Decision, Intended Player, Play Direction, Yards Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -50,417 Rows GameNum, IsPlayChal, PlayZone, TotalOffTO, PlayDecision, QtrTimeLeft, HalfTimeLeft, GameTimeLeft Defense, Down, GAP, Halftime Left, Off Ydl, Offense, Play Zone, QTR, ToGo, Total Off TO SQL 2005 AS SQL 2005 IS
Knowledge Discovery Process MINING PROCESSING Models - ID3 - Neural Networks Accuracy -Lift Charts -Classification Matrix TRANSFORMATION • Simple Statistics • -Play Decision • Intended Player • Play Direction • Yards DATA Dependent Variables Calculated Variables Independent Variables Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -53,000 Rows SQL 2005 AS MS Excel 2007 SQL 2005 AS SQL 2005 IS
Intended Player: Statistics Top 3 Intended Players for Passes for the 4 teams that played in the semi-finals H.Ward (142), P.Burress (121), B.Shaw (44) T.Brown (143), D.Patten (93), M.Edwards (39) T.Holt (133), M.Faulk (104), I.Bruce (103) J.Thrash (107), D.Staley (89), T.Pinkston (83)
Play Direction: Statistics • Direction of Rushes for all plays in 2001 season Right Tackle Right Guard Left Tackle Left Guard Right End Left End Middle Middle
Play Direction: Statistics • Direction of Rushes for all plays in 2001 season Number of Rushes Direction
Yardage: Statistics • Yardage during each down for Pass and Rush Passes Rushes Average Yards Covered Yards To Go
Play Decision: Statistics • Play Decisions for the 4 teams that played in the semi-finals Play Decision Type Number of Decisions
Play Decision: Analysis Overview • Discovery of what environmental and/or game factors affect play decision • Discovery of football expert knowledge through data mining • Prediction of play decisions based on game factors
Play Decision: Key Findings • Football strategy can be discovered through data, instead of knowledge experts • Top 3 factors affecting decision: • Down, Off Ydl, Time • Accuracy of the models are different depending on the decision we are trying to predict • Team specific strategies may be discovered with more data.
Play Direction: Analysis Overview • Discover team’s strengths and weakness in their defense and/or offense • Prediction of play directions based on game factors Right Tackle Right Guard Left Tackle Left Guard Right End Left End Middle Middle
Intended Player: Analysis Overview • Discover each team’s favored recipient of a pass • Prediction of intended player based on game factors
Intended Player: Key Findings • There are 400+ intended players • Not enough data to accurately predict intended players • Not enough data to gain knowledge over statistical models
Future Direction • Increase sample set • More instances of different scenarios • Incorporate additional information • Pro-football-Reference.com • VegasInsider.com (Odds for favorites) • Extend Analysis • Nested case (Historical performance)
References • Prof. Lisa Ordóñez • Professor in Statistics • Steve Aldrich • Author of Moneyball in Football • About Football • Glossary of terms
Knowledge Discovery Process MINING PROCESSING Models - ID3 - Neural Networks Accuracy -Lift Charts -Classification Matrix TRANSFORMATION • Simple Statistics • -Play Decision • Intended Player • Play Direction • Yards DATA Dependent Variables Calculated Variables Independent Variables Pro-Football -3 Tables -40 Columns -82,346 Rows Lisa Ordonez -1 Table -90 Columns -53,000 Rows SQL 2005 AS MS Excel 2007 SQL 2005 AS SQL 2005 IS
Research Objectives Literature Overview Knowledge Discovery Statistics: Intended Player Statistics: Play Direction Statistics: Yardage Statistics: Play Decision Accuracy: Lift Chart Charts Analysis: Play Decision Analysis: Play Direction Analysis: Intended Player Conclusions Future Directions System Design
Data Collection 55,000 rows 90 columns 47,033 rows 30 columns Dependent – 4 Independent – 10 Calculated - 9
System Design NFL KMS FOOTBALL DATA NFL Season 2001 Model Building DB Testing/ Accuracy Pattern Analysis FIELD STRATEGY DEFENSE STRATEGY METRICS Formations Accuracy Substitutions Performance Play Decisions
Yards Analysis • Yards gained on the play is used as a metric to measure effort • Discover how environmental and/or game factors affect player’s efforts • Key Findings: Top 4 environmental factors • Off Ydl • Time • Down • Gap