320 likes | 394 Views
Quantifying the Volatility of Starting Pitchers. (Sort of). Bill Petti SABR Analytics Conference March 2014 Phoenix, AZ. Let’s Make This a Little Interactive: Presentation Cliché Game. See How Many You Hear/See Today. “Needed to start somewhere” “Not sure what to make of the results”
E N D
Quantifying the Volatility of Starting Pitchers (Sort of) Bill Petti SABR Analytics Conference March 2014 Phoenix, AZ
Let’s Make This a Little Interactive: Presentation Cliché Game
See How Many You Hear/See Today “Needed to start somewhere” “Not sure what to make of the results” “Take the results with a grain of salt” “Results directional, but not definitive” “More questions than answers” “Lot’s of work to be done”
Motivating Questions Are there differences in how volatile starting pitchers are over the course of a season? Are certain types of pitchers more volatile than others?
Why Study Volatility in Baseball? • It’s my unicorn; what’s a unicorn? • My first published baseball research focused on David Wright and whether he was volatile** • Basically, I haven’t been able to let it go “Fabled creature? You know, the horse with the horn? Impossible to capture?”* *Gone in 60 Seconds **http://www.beyondtheboxscore.com/2011/1/4/1908646/player-volatility-the-case-of-david-wright
Better Reason • We know less about Volatility than other subjects, e.g. aging • There is some evidence that Volatility in run scoring and run prevention matters for teams • How teams distribute their runs can impact their expected win percentage • Sal Baxamusa* showed that the increase in win probability becomes more marginal as teams score more than 5 runs *The Hardball Times, 2007, http://www.hardballtimes.com/consistency-is-key/
Why Study Volatility in Baseball? • Some evidence that Volatility at the team level helps teams beat their Pythagorean Expectation* • Run Scoring (RS) and Runs Allowed (RA) Volatility were both negatively correlated to total wins • However, RS and RA Volatility were positively correlated to wins above expectation *FanGraphs, 2012, http://www.fangraphs.com/blogs/does-consistent-play-help-a-team-win/
Volatility is not the same as Streakiness Streakiness is about how extreme positive and negative performances lump together over the course of a season Average wOBA Volatility is about the overall distribution of a player’s daily performance relative to their average (i.e. central tendency) AveragewOBA
Volatility and Hitters • Developed a metric for quantifying the volatility of hitters (VOL) and examined what types of hitters may be more prone to inconsistency* VOL=STD(daily_wOBA)/Yearly_wOBA.52, where: VOL=Seasonal Volatility STD(daily_wOBA)=the standard deviation of a player’s daily batting performance, measured by wOBA Yearly_wOBA.52: a player’s seasonal wOBA, raised to .52 power Only games where a player had three or more plate appearances are included *The Hardball Times, 2014, http://www.hardballtimes.com/what-kind-of-hitters-are-volatile/
Volatility and Hitters (cont.) • This method ensured that VOL was not biased in favor of inferior players and not simply a function of players with high PA/G • VOL has a year-to-year correlation of .4 (n=435) • Some evidence it’s a repeatable skill, but one that fluctuates much like BABIP or batting average • High VOL hitters tended to be high strikeout, fly ball slugging hitters, while low VOL hitters tended to be ground ball, high contact, high on-base hitters • Some evidence that hitters might be “structurally volatile”, but not all performance explained by this • Phrase borrowed from Matt Swartz and his work on LHHs and clutch performance
What About Pitchers? • There have been some attempts to quantify consistency in pitchers • David Gasko using Quality Starts as a proxy for consistency* • Controlling for ERA, pitchers don’t retain their consistency, year-to-year • But inconsistency in a pitcher is preferable compared to a consistent pitcher of similar talent • Eric Seidman using the Flake statistic at Baseball Prospectus** • I briefly looked at a pitcher version of volatility, consistent Flake*** • Pitching creates unique challenges to this type of metric *The Hardball Times, 2006, http://www.hardballtimes.com/what-kind-of-hitters-are-volatile/ **2009, http://www.baseballprospectus.com/article.php?articleid=8579 ***Beyond the Box Score, 2011, http://www.beyondtheboxscore.com/2011/9/8/2404007/pitcher-volatility-part-i
What About Pitchers? • First, hitters generate a larger number of observations for study over the course of a season • Roughly 5x as many observations than starting pitchers • Second, this makes outliers much more of a problem for pitchers • Third, managers create the biggest problem, since starters don’t’ control when they will exit a game • Tends to accentuate the outlier issue
Decisions, decisions, decisions… • Could continue to use a standard deviation-based metric • But, distribution of game performance not quite normal, and outliers can play havoc with individual scores • Another option is interquartile range (IQR), adjusted for median (i.e. quartile coefficient of dispersion or IQR COD) • Similar to hitter VOL, which uses coefficient of variation) • Still not perfect, but IQR CoD a robust measures that handles outliers better
Decided to use IQR CoD • E.g. Buzz Capra, 1974 • 27 starts in 1974 with a ERA- 59 • 2.80 RA9 in those starts, but a few key outliers • 22.5 and 405.0, both outings lasted less than 2 IP • Using the IQR CoD method does mitigate the impact of outliers
Volatility for Pitchers • Data from 2009-2013, only pitchers that started >= 20 games used • There was no limit placed on the number of innings for a start • Tough decision, but had to start somewhere RA9VOL=(IQR_daily_RA9/2) / Median_daily_RA9, where: IQR_daily_RA9=Interquartile Range of pitcher’s daily RA9 Median_daily_RA9=Median of a pitcher’s daily RA9 FIPVOL=(IQR_daily_FIP/2) / Median_daily_FIP, where: IQR_daily_FIP=Interquartile Range of pitcher’s daily FIP Median_daily_FIP=Median of a pitcher’s daily FIP
Comparing RA9 and FIP VOL • At a population-level, the volatility of RA9 has a much larger spread than the volatility of FIP • RA9VOL: Mean - .65 Standard Deviation - .23 • FIPVOL: Mean - .36 Standard Deviation - .09 Average Average
Contrasting RA9 and FIP VOL • E.g. Dillon Gee, 2013, 32 GS • RA9 3.89 / FIP 4.00
Contrasting RA9 and FIP VOL:Ryu vs. Strasburg 2013 • Both pitchers posted a 3.00 ERA, 30 GS, and ~ 6 IP/GS • Ryu– 3.24 FIP, Strasburg 3.21 FIP • While very similar in their seasonal outcomes, Ryu was the more consistent starter, both in terms of runs allowed and FIP
Hypotheses on Causes of RA9VOL • Pitchers with high K%s will have lower VOL • Pitchers with high LOB% will have lower VOL • Pitchers with high BB% will have higher VOL • Pitchers with high HR/FB rates will have higher VOL • Pitchers with high BABIP will have higher VOL • Pitchers with low GB/FB rates will have higher VOL
Testing the Hypotheses • Four of the six hypothesized variables were statistically significant, but the magnitude of the relationship was small • None of the relationships were in the hypothesized direction
What to Make of This? • While the relationships were significant, the directionality is hard to explain • When taken together, it’s hard to decipher • High K, high LOB = higher VOL; but • High HR/FB, high BABIP = lower VOL • Often, pitchers with high K rates are also more home run prone (throw more fastballs, attack the zone, fly ball pitchers, etc.) • Pitchers with lower BABIPs tend to strand more runners, not fewer
Is VOL a Talent? • A quick read on whether something is a talent or skill is to see if it is repeatable, year over year • Previous research suggested that, whatever measure you use, VOL or consistency was not repeatable in consecutive years • And that appears to be the case with my measure as well
Is VOL a Talent? • It’s possible that VOL is simply a descriptive statistic that captures the variances from pitcher to pitcher in how their performances randomly distributed themselves over 30-ish starts • It’s also possible that VOL is a statistic that needs more time to stabilize, much like BABIP • Need to look at multiple seasons averaged to get a better sense of a pitcher’s consistency • Finally, because of some of the inherent problems with trying to measure VOL, it may be best used to compare pitchers with similar outcome metrics • E.g. compare pitchers with similar ERAs, K%, etc. • Provides another data point to consider • Or, Occum’s Razor: the metric isn’t that great
Summing Up • There appear to be measurable differences in how pitchers distribute their runs allowed and FIP over the course of a season • And those differences are normally distributed over the course of a season • FIP appears to be generally less volatile than RA9 across the league • However, VOL itself seems quite inconsistent, year to year, at the individual level • It does appear to stabilize a bit when looking at multiple seasons (akin to clutch ability) • Also, it’s not clear VOL is structurally determined, somewhat like clutch hitting
So Where Do We Stand? • I’m not in love with this metric, currently, and recommend anyone use it with a big, fat grain of salt • It could be that parks impact pitcher VOL more than hitters • Need to split the data by home and away starts (hat tip Vince Gennaro) • Quality of opponent could also play a role (hat tip Sean Forman) • There is also the possibility that inconsistency in mechanics throughout the year impacts VOL more than other metrics • Adjustments to mechanics, or just inability to repeat mechanics, or injury could be what drives VOL (hat tip Jeff Zimmerman) • It could also be that there is no way around the IP/GS issue • Two pitchers that give up 4 runs over 8 innings could arrive their differently; one gives up 4 runs over the last 2 innings, the other over the first 3. High odds the latter doesn’t make it to 8 innings • Bottom line: more work to be done