1 / 38

By Adam Carney

1 st Annual USD Mathematics Project Day May 7, 2004, University of San Diego, San Diego, CA. Determining the Offensive Value of a Major League Baseball Player Using Linear Regression Analysis and Bootstrap. By Adam Carney. Outline. Problem Formulation Previous Work Bootstrap Method

gaia
Download Presentation

By Adam Carney

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1st Annual USD Mathematics Project Day May 7, 2004, University of San Diego, San Diego, CA Determining the Offensive Value of a Major League Baseball Player Using Linear Regression Analysis and Bootstrap By Adam Carney

  2. Outline • Problem Formulation • Previous Work • Bootstrap Method • Regression Results • Evaluation Tools • Player Rankings & Comparisons • Conclusions & Future Plans

  3. Outline • Problem Formulation • Previous Work • Bootstrap Method • Regression Results • Evaluation Tools • Player Rankings & Comparisons • Conclusions & Future Plans

  4. Outline • Problem Formulation • Previous Work • Bootstrap Method • Regression Results • Evaluation Tools • Player Rankings & Comparisons • Conclusions & Future Plans

  5. Outline • Problem Formulation • Previous Work • Bootstrap Method • Regression Results • Evaluation Tools • Player Rankings & Comparisons • Conclusions & Future Plans

  6. Outline • Problem Formulation • Previous Work • Bootstrap Method • Regression Results • Evaluation Tools • Player Rankings & Comparisons • Conclusions & Future Plans

  7. Outline • Problem Formulation • Previous Work • Bootstrap Method • Regression Results • Evaluation Tools • Player Rankings & Comparisons • Conclusions & Future Plans

  8. Outline • Problem Formulation • Previous Work • Bootstrap Method • Regression Results • Evaluation Tools • Player Rankings & Comparisons • Conclusions & Future Plans

  9. Problem Formulation • Given: MLB Season Statistics • Objective: Find offensive value of each player • Limitations • One semester • Use season data rather than game data • Lack data for certain statistics such as GIDP, ROE, SF and IBB.

  10. Problem Formulation (cont.) • How to Model Mathematically? • Use regression analysis on team data for runs scored per game (RPG) • Apply this model to individual players • Hidden Assumptions/Constraints • Zero constant term in regression model • Seven assumptions of ordinary least squares regression, including • No Serial Correlation • No Perfect Multicollinearity • Normally distributed error term

  11. Problem Formulation (cont.) • Based on this we propose, • Expected Magnitudes and Signs

  12. Pete Palmer’s Linear Weights Bill James’ Runs Created & Previous Work

  13. Pete Palmer’s Linear Weights Bill James’ Runs Created & Previous Work • Computer Simulation • Probability Theory • Runs Above Average

  14. Pete Palmer’s Linear Weights Bill James’ Runs Created & Previous Work • Simplified Version • Complex Version • Over 60 lines of calculations • Takes almost everything into account

  15. Bootstrap Method • Resampling Technique • Used it to estimate bias and variance of a random sample of RPG data • Repeated with 99 and 999 resamples

  16. Bootstrap Method • Estimate Bias and Variance R=99 R=999

  17. Bootstrapping Residuals • Run OLS regression to find • Resample residuals for each y • Find • Run OLS regression on (x , y*) • Repeat many times • Create prediction interval using ‘s a) Resample out of sample errors b) Find c) Build prediction interval with and

  18. Advantages of Bootstrap • No assumption on error terms other than independence • Estimates actual distribution of error terms allowing more accurate prediction intervals • Could also bootstrap cases rather than residuals • Does not assume constant variance • Simulated samples have different designs

  19. Regression Results • Used SPSS • Ran separate regression analysis for each era without a constant term • Looked at several possible models for each era • Used best model to apply to individual players

  20. Regression Results

  21. Evaluation Tools • Ran Chow Test to see if two sets of regression coefficients are equivalent df = Degrees of Freedom Prob. Value = Probability the two (or three) data sets have the same coefficients

  22. Evaluation Tools (cont.) • Error Analysis (used last 3 years of each era) • Calculated Mean Absolute Deviation and Mean Absolute Percentage Error

  23. Evaluation Tools (cont.) • 90% Prediction Intervals from Bootstrap • Calculated for each out of sample data point using matrices • gave an N by 1000 matrix with each row containing for a different data point

  24. Evaluation Tools (cont.) • Found confidence interval for each data point • Counted how many correctly predicted

  25. Linear Run Values (LRV) • Applied regression results to individuals’ season statistics • Used model that correlated with the era of that season • Computed adjusted LRV by dividing by the Batter Park Factor • Adjusted each stint separately if player played for more than one team in a season

  26. Top LRV Rankings - Season

  27. Top LRV Rankings - Career

  28. Conclusions • Split data into five different eras • Ran regression analysis with and without bootstrap • Applied model to individual players • Compared results to those of Pete Palmer’s Linear Weights and Bill James’ Runs Created

  29. Conclusions • Split data into five different eras • Ran regression analysis with and without bootstrap • Applied model to individual players • Compared results to those of Pete Palmer’s Linear Weights and Bill James’ Runs Created

  30. Conclusions • Split data into five different eras • Ran regression analysis with and without bootstrap • Applied model to individual players • Compared results to those of Pete Palmer’s Linear Weights and Bill James’ Runs Created

  31. Conclusions • Split data into five different eras • Ran regression analysis with and without bootstrap • Applied model to individual players • Compared results to those of Pete Palmer’s Linear Weights and Bill James’ Runs Created

  32. Future Plans • Use game-by-game data • Find a more complete data set • Separate American & National Leagues • Forecast future LRV • Evaluate LRV vs. Salary • Find Pitchers’ and Fielders’ LRV

  33. Future Plans • Use game-by-game data • Find a more complete data set • Separate American & National Leagues • Forecast future LRV • Evaluate LRV vs. Salary • Find Pitchers’ and Fielders’ LRV

  34. Future Plans • Use game-by-game data • Find a more complete data set • Separate American & National Leagues • Forecast future LRV • Evaluate LRV vs. Salary • Find Pitchers’ and Fielders’ LRV

  35. Future Plans • Use game-by-game data • Find a more complete data set • Separate American & National Leagues • Forecast future LRV • Evaluate LRV vs. Salary • Find Pitchers’ and Fielders’ LRV

  36. Future Plans • Use game-by-game data • Find a more complete data set • Separate American & National Leagues • Forecast future LRV • Evaluate LRV vs. Salary • Find Pitchers’ and Fielders’ LRV

  37. Future Plans • Use game-by-game data • Find a more complete data set • Separate American & National Leagues • Forecast future LRV • Evaluate LRV vs. Salary • Find Pitchers’ and Fielders’ LRV

  38. Leftover Statistics • Estimated time spent on project = 400 hours • # of weeks spent on project = 12 • # of hours per week = 33.33 • Predicted # of hours per week = 20 • Prediction error = (33.33-20)/20 = 66.7% • # of data points after bootstrapping = 1,700,000 • Size of data files after completion = 2,199,141,179 bytes (2.04 GB)

More Related