1 / 16

Baseball Statistics: Just for Fun!

Baseball Statistics: Just for Fun!. Issues, Theory, and Data. Home Run hitters: more strikeouts and four balls, and less steals?. Hypothesis. Korea Baseball Organization and US Major League Home Pages. Data collection. y1=#strikeouts,y2=#steals,y3=#4Bs, x=#HRs. Regress y on constant, x.

sonja
Download Presentation

Baseball Statistics: Just for Fun!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Baseball Statistics: Just for Fun!

  2. Issues, Theory, and Data Home Run hitters: more strikeouts and four balls, and less steals? Hypothesis Korea Baseball Organization and US Major League Home Pages Data collection y1=#strikeouts,y2=#steals,y3=#4Bs, x=#HRs. Regress y on constant, x. Model Hypothesis Testing Test the statistical significance of regression slopes using t-tests.

  3. 2. Data Collection KBO http://www.koreabaseball.or.kr US Major League Baseball http://www.majorleaguebaseball.com

  4. 3. Model I (#strike outs) = 1 + 1(#HRs) + 

  5. 3. Model II (#steals made) = 2 + 2(#HRs) +  (#steals attempted) = 3 + 3(#HRs) + 

  6. 3. Model III (# four balls) = 4 + 4(#HRs) + 

  7. 4. Hypothesis Testing • t-test on  Significant 4= ?? 1= ?? 1 = 0.84 t-value = 2.89 4 = 0.51 t-value = 2.50 Insignificant 2, 3= ?? 2 = -0.12 3 = -0.18 t-value = -0.94 t-value =-1.14

  8. 4. Hypothesis Testing (1) HR hitters get more strike outs! (2) HR hitter does not well steal a base because of his big body. Insignificant (3) HR hitters pull out more four balls!

  9. Wait a minute! To prevent “spurious correlation” between #HRs and #strike-outs, #steals, #4Balls, we need to control for the number of appearance at the batter box.Right!

  10. Multiple Regression–control for “#at bats”- • Without “control for # at bats,” a hitter with more appearances would record a higher number in each category than others, generating “spurious correlation between any pair of variables among #HRs, #strike-outs, #steals, and #four balls. • Two ways of control for # at batter box • Use a subsample of hitters who appeared more than 100. • Use “# at bats” as a control variable in multiple regression.

  11. Model I (extended) (#strike outs) = 1 + 1(#HRs) + 2(#at bats)

  12. Results using sub-sample 1 = 0.84 (2.89) 1 = 0.89 (2.88) 2= -0.03 (-0.49) 1 = 2.40 (11.64) 1 = 0.63 (3.11) 2= 0.14 (12.53) using entire sample

  13. Interpretation sub-sample 1 = 0.84 (2.89) 1 = 0.89 (2.88) 2= -0.03 (-0.49) When using a sub-sample which is already rather homogeneous in terms of number at bats, it doesn’t make much diference whether you control for # at bats or not. However, when using the entire sample which comprises of hitters vastly differing in terms of number at bats, control for # at bats does matter. In this entire sample, you would get distorted results if you do not control for # at bats. 1 = 2.40 (11.64) 1 = 0.63 (3.11) 2= 0.14 (12.53) entire sample

  14. Model II (extended) (#4Balls) = 1 + 1(#HRs) + 2(#at bats)

  15. Results sub-sample 1 = 0.51 (2.50) 1 = 0.34 (1.71) 2= 0.12 (2.77) 1 = 1.32 (11.01) 1 = 0.33 (2.73) 2= 0.07 (11.51) entire sample

  16. The End Was it fun?

More Related