In Defense of WAR

The baseball blog community is abuzz today with news of blogger Hippeaux’s piece at It’s All About the Money, Stupid, called “Is WAR the new RBI?” In this piece, Hippeaux posits that shortcomings in the formula for Wins Above Replacement (particularly fangraphs’ version) render it not all that much more valuable in evaluating players than the archaic Runs Batted In.

As a fan of SABRmetrics in general and fangraphs’ WAR in particular, I must admit that I was shaken by the article. Some of Hippeaux’s criticisms of WAR are valid and provocative, and I almost immediately found myself regretting my recent declaration that Dustin Pedroia has been the most valuable player in the AL this year, founded almost exclusively on the basis of fWAR.

Before I visit several points from Hippeaux’s piece in greater detail, I should note that there are two popular and opposing sentiments about WAR that are (almost) equally irresponsible. The first is the idea that everything a player does on the field is wrapped up in such a neat little package with such a perfect little bow that the WAR leaderboard is a facsimile of an ideal MVP ballot. As the piece in question points out again and again, fWAR’s margin of error may be as much as 15%, which renders the difference between, say, Troy Tulowitzki’s and Joey Votto’s fWAR, essentially meaningless.

Even more irresponsible is the sentiment that the WAR formula is too complicated for the average person to calculate it herself, therefore it should be dismissed based on the probability that the formula doesn’t make any sense. The elements of fWAR (weighted Runs Created plus, Baserunning Runs Above Average, Ultimate Zone Rating) are some of the most thorough, accurate, and commonly trusted metrics available. That the formula isn’t perfect just means that WAR, like every other statistic, should be used in conjunction with other measures to evaluate a player’s success.

With this in mind, I would like to address a few key points from Hippeaux’s manifesto:

1. “What if Granderson played behind Ian Kennedy and Daniel Hudson?: UZR & Flyball Rates”
Hippeaux suggests that fWAR’s defensive component, Ultimate Zone Rating, like RBI and wins, is largely based on context. He cites a correlation between pitching staffs who allow more fly balls than an average staff and outfielders whom UZR credits with superior ability to catch fly balls. This is indeed a fair point. Since UZR’s invention, it’s been commonly noted that one year of UZR tends not to provide a sample size sufficient to judge a player’s true defensive ability.

My question is this: is there a better way to account for a player’s defensive contributions over one season than to factor UZR into a combined offense/defense/baserunning formula? Some WAR critics suggest replacing one year of UZR with three years to smooth out the year-to-year variations. It certainly isn’t fair, though, to include last year’s outcomes in a measurement of this year’s value. Let’s say, for example, that two center fielders average five runs saved above average over a two-year span. In the third year, player X devotes his offseason workouts to getting in better shape and increasing his range, while player Y puts on muscle and loses flexibility. If player X’s RSAA improves to +11 and player Y’s shrinks to -4, should we really say that the difference between them is only three runs per year (the three-year average), rather than the 15-run margin in the year in question? Furthermore, aggregating three years of data is inequitable to slick-fielding rookies and unfairly kind to aging players whose skills are declining.

An alternate solution would be to regress defensive numbers toward a mean value (probably zero), again to smooth out year-to-year fluctuations. This may make more sense, but like past metrics available to MVP voters, it downplays the difference between an excellent fielder and a poor one. If Peter Bourjos hits almost as well as BJ Upton over a full season and plays a much better center field, should we regress the difference in their fielding values based on the possibility that there’s noise in the fielding data, inferring that the two players are equally valuable? Of course not. Runs saved on defense are worth exactly the same amount to a baseball team as runs created on offense and the statistic should reflect this.

UZR isn’t a perfect measure of skill, but no statistic is. One player may face an easier slate of opposing pitchers over the course of a 162-game season, resulting in a one-year on-base percentage higher than that of a superior player who faced Roy Halladay five times. That doesn’t mean we should stop using OBP as a measure of how good a hitter is at getting on base, only that we should take all statistics in context and recognize that a bigger sample size will always bear out more accurate results.

2. I can’t play several positions. (or “The Adam Dunn Effect”)
In this section, Hippeaux posits that WAR is overly friendly to players like Carlos Lee who play multiple positions, not because they’re particularly versatile, but because their bats necessitate that they play a position, but their gloves don’t fit well anywhere. I can’t speak intelligently to the effect of positional flexibility in the WAR formula, but I will note that if a player is not skilled enough to play a position he is assigned to play, his deficiencies will bear themselves out in the numbers, unless of course no balls are hit to him. In that case, the numbers don’t accurately assess his potential value, but they do properly state that he did not cost his team any runs by manning that position.

3. WAR Hates Sluggers
I imagine the header here was intentionally and somewhat ironically glib, but behind the sophomoric rhetoric, Hippeaux makes a salient point about lineup construction. WAR favors all-around athletes who hit well, run the bases well, and field important positions without embarrassing themselves. As such, a rounded player like Ian Kinsler tends to grade higher than a one-dimensional slugger like Prince Fielder. Hippeaux opines that the positional adjustments embedded in fWAR dilute the impact of a superstar slugger on a team’s strategy and results.

While it’s true that home runs always score the batter and often score other runners, all of this is baked into the runs created formula that feeds fWAR’s Batting Runs Above Average. This dates all the way back to Pete Palmer’s Linear Weights, which determine, on average, how many runs a home run provides a team as compared to a single or a groundout. Every time Prince Fielder hits a three-run home run, he’s provided immense value, but two batters in front of him in the lineup deserve credit for getting on base, extending the inning, and forcing the pitcher into the stretch so that Prince can drive them in. It’s true that Fielder’s presence in the on-deck circle may impact the pitches that Ryan Braun sees in front of him, but shouldn’t a pitcher be just as wary of putting Braun on base in front of any hitter who regularly gets on base and keeps the inning going?

The negative adjustment fWAR makes to account for innings played at lower-leverage positions is reflective of the heart of WAR (and the namesake of this blog): replacement level. If Prince Fielder bolts Milwaukee for greener pastures this offseason (which is quite likely), the Brewers will promote Mat Gamel or bring in another slugging first baseman to take his place. If Kinlser should leave Texas, the Rangers would be hard pressed to replace not just his offensive production, but his slick glove and astute baserunning. A team full of Kinslers snaring would-be gappers out of the air and consistently pounding out singles would win more games than a team full of Fielders hitting the occasional home run, but never turning a double play on defense. Sure, we would all prefer a mix of both types, but Kinsler is truly harder to replace than Fielder.

Hippeaux’s piece certainly didn’t fall on deaf ears. Both versions of WAR have their shortcomings and their share of critics. I personally find that the difference in sophistication and thoroughness between WAR and just about any available player value metric is so extreme that it’s easy to use WAR as a one-stop shop for value determination. Hippeaux’s points (and I certainly recommend you read the whole article, as I didn’t cover them all) will give me pause the next time I make a fervent declaration of one player’s superiority to another player based exclusively on WAR. However, as the author notes, “as yet, it is probably as good a singular statistic as is widely available.”

And it sure as hell beats RBI.

This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to In Defense of WAR

  1. hk says:

    “Some WAR critics suggest replacing one year of UZR with three years to smooth out the year-to-year variations. It certainly isn’t fair, though, to include last year’s outcomes in a measurement of this year’s value.”

    You are right in that you should not use past years’ outcomes in this year’s WAR when using WAR descriptively to determine value in the past year. However, if you are using WAR (as a GM) to assess a free agent’s value or (as a fan) to assess a team’s free agent signing in the off-season, it may make sense to use three years of UZR data or to make an adjustment to the player’s defensive rating if the most recent year seems like an outlier.

    • Bryan says:

      Thanks for commenting, hk. You’re right that I was defending WAR primarily as a starting point in MVP arguments, and that it could be used differently in assessing a free agent’s value. As for using three years of UZR value, why not just use three years of WAR? I get that UZR fluctuates more because most players get far more plate appearances than defensive chances, but offensive numbers fluctuate too. To wit, Ben Zobrist’s wOBA the past three years: .408, .323, .360. If I’m a GM thinking about signing Zobrist, I don’t want to overpay based on a .408 wOBA or pass on him based on a .323. Bigger samples are always more reliable, but fWAR is the best 162-game summary we’ve got.

  2. JoeM says:

    The problem is that the replacement level is not set appropriately on a position by position basis.

    Rowand pulled a +.7 WAR thru 100 games this year. This is all the evidence anyone should need to know that what the metric assumes of a replacement level center fielder is far too low.

    • Eric R says:

      Well one thing is that the positional adjustment is set based on historical levels, so if there is a glut of great players at one position, it’ll make it appear that replacement level is too high.

      That said, The top 30 players [by PA] fangraphs lists as CFers, here are the five with the lowest WARs:

      Rasmus 1.3
      Pagan 1.0
      Rowand 0.7
      Patterson 0.1
      Rios -0.9

      So, it seems that things are about right there. The worst “regulars” are all near replacement level. Granted, the median player in that top 30 is a bit higher than other positions:

      CF 2.8
      RF 2.5
      1B 2.4
      SS 2.1
      3B 2.0
      CA 2.0
      2B 1.7
      LF 1.2

      Not sure if this means that LFers are getting a bit shafted this year with the positional adjustment and CF/RF/1B are making out like bandits?

Leave a comment