230 likes | 355 Views
Statistical Confirmations of Steroid Use?. Andy Dolphin Raytheon Company August 6, 2008. Outline. Background Adjusting Statistics Quantifying Players Abilities The Mitchell Report Sample Culling the Statistical Records. Background. Astronomy Stellar populations in nearby galaxies
E N D
Statistical Confirmations ofSteroid Use? Andy Dolphin Raytheon Company August 6, 2008
Outline • Background • Adjusting Statistics • Quantifying Players Abilities • The Mitchell Report Sample • Culling the Statistical Records
Background • Astronomy • Stellar populations in nearby galaxies • Data analysis techniques • Sports • Analysis and prediction of team performances • Baseball player projections and analysis • Coauthor of The Book: Playing the Percentages in Baseball • Consultant for Cleveland Indians
Adjusting Player Stats • We need a way to determine if a player’s performance has improved or degraded. • Critical aspect • We don’t care if a player’s performance is better than other years • We do care if his performance is better than would be expected.
Adjusting Player Stats • Factors affecting a player’s performance • Age • Home ballpark • Strength of league • Usage (relief vs. starting pitchers) • Teammates (players do not face them) • For consistency, player statistics adjusted to age 25 and to the NL strength of their rookie seasons.
Year-to-Year Correlations • By comparing adjusted metrics over many seasons, one can determine how much players deviate from average career trajectory. • For both hitters and pitchers, multiplying number of PAs by 0.9∆year gives fairly constant prediction accuracy.
Characterizing Player Ability • Need a metric that includes entire effect on game’s outcome. • For example, OBP considers a walk and home run as equals. • Solution: each outcome is scored based on its average effect on the team’s winning probability, relative to an out. • A single is worth about 0.07 wins. • This metric tends to be about 1/10 of batting average.
Characterizing Player Ability • Need a metric that is indicative of player’s ability. • For example, a pitcher’s win-loss record heavily depends on run support and fielding. • Solution: adjust outcome rates to reflect the degree in which they are indicative of a player’s abilities (regression towards mean). • For example, a hitter retains about 40% of his single-hitting rate from year to year, compared with under 20% for a pitcher.
Player Career Trajectories Selected players listed in the Mitchell Report
Mitchell Report Sample • The Mitchell Report identified players suspected of steroid use, as well as specific years in which purchases could be tracked. • Do players show better performance in these seasons, compared with their career baseline? • Do players listed show more deviation than average over their careers?
Mitchell Report: Single Seasons • 32 hitters played 63 seasons with 300+ PA • Average improvement = 3.4% ± 1.2% • The only statistically-significant sample came from the BALCO-tied players, who averaged about a 10% increase in production. • 16 pitchers played 35 seasons with 300+ PA • Average improvement = 3.3% ± 1.5%
Mitchell Report: Single Seasons • Problems with this analysis: • Mitchell Report specifically identifies years in which players purchased drugs from particular sources, not the entire time of use. • Significant performance swings can be masked by the statistical uncertainties with even a full season of data. • There is likely a correlation between injuries and steroid usage that needs to be accounted for.
Mitchell Report: Careers • Do players listed in Mitchell Report have larger than average variation from typical career trajectory? • Hitters: variation/avg = 1.10 ± 0.08 • Again, only statistically significant sample comes from BALCO players • Pitchers: variation/avg = 1.09 ± 0.12
Spotting Unusual Players from the Statistical Record • Instead of looking at specific players for signs of improvement, what if we look for players based on unusually large deviations from average career trajectory? • This helps avoid selection biases. • Three-year period with performance significantly better than career average and previous three years • Significant dispersion compared with average career trajectory.
Batters 1975-1977 Rod Carew 1996-2000 Ken Caminiti 2000-2002 Jason Giambi 2000-2002 Sammy Sosa 2001-2003 Bret Boone 2001-2004 Barry Bonds Pitchers 1986-1989 Mike Scott 1993-1998 Greg Maddux 1999-2003 Pedro Martinez 2002-2004 Jason Schmidt Players with Unusual Profiles,1975-2007 • Baseline of ~1 “positive” per decade prior to 2000 • 2000’s appear to be a very different era
MLB Average Dispersions from Baseline • Overall, players show average tendencies. Batters Pitchers
Summary • The player seasons implicated for steroid use in the Mitchell Report were better than career baseline at about the 2-sigma level. • Large number of significant deviations from career baseline over last 10 years, especially among hitters. • League-wide, players generally are within historical norms of baseline performance; thus it is unlikely that a large number of players are achieving a significant benefit from steroid use.