THE BOOK cover
The Unwritten Book is Finally Written!
An in-depth analysis of: The sacrifice bunt, batter/pitcher matchups, the intentional base on balls, optimizing a batting lineup, hot and cold streaks, clutch performance, platooning strategies, and much more.
Read Excerpts & Customer Reviews

Buy The Book from Amazon


2013 Bill James Handbook

THE BOOK--Playing The Percentages In Baseball

<< Back to main

Wednesday, November 26, 2008

The History of the wOBA, part 1

By .(JavaScript must be enabled to view this email address)

Why does wOBA exist?

Here’s an extremely long post that hopefully will answer all your questions.  And if it doesn’t, well, I guess that’s why I called this thread “part 1”.


When we were working on The Book, it became clear very quickly that one thing we’d need to do is establish the statistical significance of the data we were seeing.  And the best tool available for us was using the binomial distribution of a rate stat.  The binomial distribution requires that events occur as a binary (i.e., safe or out).  While you may think of batting average (BA), the better rate stat is on base percentage (OBP). 

BA takes the nonsensical view that a walk is a non-event.  Well, perhaps for what it tries to accomplish, it might be perfectly legitimate.  What we need is something that has plate appearances (PA) in the denominator, and that includes walks.  So, OBP becomes the stat of choice.  It describes each PA rather clearly: safe or out.  And that’s what baseball is about at its core.  OBP, while not as popular as BA yet, has staying power.  While the continued existence of BA is iffy (a large part of its being is simply inertia), OBP will always exist.  If it didn’t exist, it would have been invented.  BA enjoys no such fundamental truth. 

Anyway, while OBP initially satisfied our objective for The Book, it became quite clear very early that we simply couldn’t always treat a walk and a HR equally.  In OBP, each safe event counts as “1”.  So, we needed something else, something that could align itself to OBP to make comparisons straightforward in The Book, be binomial-like, and better weight each event.  We needed something that was almost interchangeable with OBP.

Enter Linear Weights.  Linear Weights contains the simple truths of baseball run creation.  I knew that I needed to somehow get Linear Weights onto a “rate” scale, so that I could do what I needed to do for The Book.  The overriding constraint is that the average weight of each safe event must be exactly equal to 1.  Since OBP has each safe event as “1”, and since my new rate stat will have the same denominator as OBP (i.e., PA), then the numerators had to match.

This means that if I underweight the walk, then I need to overweight the extra base hits.  Overall, the average of these coefficients had to be exactly 1.  With a bit of work, I came up with the logical basis to convert Linear Weights into a rate stat.

This is why the stat is called a WEIGHTED On Base Average.  It keeps the basis of OBP, which is safe divided by PA, except it tweaks the weight of each safe event to better match its actual impact to scoring runs, all while being centered, overall, to a weight of exactly 1.

So, weighted On Base Average (wOBA) exists to serve a particular purpose, one which was leveraged as best we could, in The Book.  We were able to use wOBA and OBP at the same time, treat them almost in the same manner, and see results along the same scale.  This is the reason for its initial existence.

Since then, it gains a bit of traction.  And we get the natural questions, the first being: why not park adjust it?  Well, the reason we didn’t need to, most of the time, is that in The Book, for the reason we needed the stat, when you deal with large groups of data, the park effects will cancel out (the data is not biased and so, applying adjustments really just obfuscates things).  Sometimes, we DID need to park adjust it.  And, in those times in The Book, we did so.  There’s nothing inherently difficult about park adjusting a stat.  Everyone does it.  For the purposes of The Book, the park adjustments didn’t need to be that strict, since again, even the times we needed it, the bias is not strong enough for us to worry whether a park factor should really be 1.04 and not 1.03.

But, when wOBA is released into the wild, these minor adjustments become more important, especially when you deal with Coors or Petco hitters, and you are dealing with individial hitters.  In The Book, we were worried about groups of data, not really worried about individual hitters.  These adjustments are no longer minor.  There is nothing inherent about wOBA that would prevent you from making the park adjustment.  Just because I haven’t done it yet doesn’t make it a design flaw; it simply means that someone out there is free to do it.  As Bill James once said, “I can’t do all this myself”.  This is why most of my work is fairly open and reproducible, so that some people can roll up their sleeves.

Responding to Rob

Rob Neyer, friend of The Book Blog, wrote a blog entry about wOBA, and his readers posted their thoughts and questions.  I will highlight some parts of that thread in the hopes of casting some light on the matter.

According to wOBA, Albert Pujols was 67 runs better than average; according to BRAA, he was 82 runs better.

Clay Davenport at Baseball Prospectus shows Albert Pujols with:
102 Runs above replacement, 72 Runs Above Position, and 88 Runs above replacement position.  He also shows him with 98 Batting runs above replacement, 82 batting runs above average.  I’m not entirely sure exactly what each one does, nor why the two “runs above replacement” figures would differ by 4 runs.

Keith Woolner, at the same site shows Pujols with 99 runs above replacement, 75 runs above positional average, and 91 runs above league average.

Clearly, just within one site, there is some head-scratchers.  Baseball-Reference has Pete Palmer’s park-adjusted Linear Weights, and Pujols is +77 runs above average.

What does unadjusted wOBA say?  Actually, it doesn’t say anything, since I haven’t published those numbers.  You can try to work them out, and the answer is +72 runs. (The figure Rob cites, +67, I can get close to as +68, if I treat the IBB as a non-event.)

There are two main issues with Pujols: how to treat the IBB (which is enormous for great hitters), and (as it applies to everyone) how to handle the park and league adjustment.

In any case, you have five different results from four different sources, all using different methodologies, and applying different adjustments.  Mine is completely open, and indeed, I published the exact SQL on my site.  I would hope that showing my work is actually a good thing, and that the black box systems don’t get a pass.

It’s weighted on-base average … except it’s not, at all. It’s really linear weights on a scale that looks like on-base percentage

I’ll disagree slightly, but not vehemently.  I explained at the top of this thread why it can be considered a weighted on-base average.  Rejecting the argument basically says that it is impossible to even have a weighted on-base average, that the very idea is nonsensical.  Again, I’ll disagree slightly, but not vehemently.

Now, let’s jump ahead and say that two or three years down the line, the big mistake was discovered internally. Would BP announce to the world that all those numbers over the previous three years had been wrong? Or would the guys running the show decide that the loss of credibility (and potentially, revenues) isn’t balanced by the loss of integrity?

Indeed, I did find problems.  I announced it to the world via my blog.  Clay Davenport was receptive to the arguments, agreed that changes needed to be made.  He’s made bug fixes that he announced, and he did other changes, but certainly not all of them.  Others at BP were not so receptive to my arguments. 

What I usually want to know isn’t how good a hitter someone is. What I really want to know is how good a player he is, and WARP, by combining hitting and fielding, tells us this.

This is really outside the wOBA issue.  You can’t criticize a metric for not doing what it wasn’t supposed to do.  “Paris Hilton isn’t funny.”  Guess what, she’s not supposed to be!  While Rob wasn’t necessarily criticizing wOBA, his argument is out of place in the wOBA discussion.

Responding to Rob’s readers

I know it’s become chic to criticize BP for this kind of stuff, but it really seems like nitpicking.

If you study the differences between Woolner and Davenport, it is not nitpicking at all.  Over the summer, I showed a 15-run gap between how Woolner sees ARod/Mauer and how Davenport sees the same two players at the same time.

Trust but verify.

Rob, while I know you didn’t develop wOBA, can you nonetheless explain to me the 1.15 factor involved in the computation? What is the signifcance?

In the “logical basis” and “exact SQL” links above, I describe it in detail.

Is there actually a peer reviewed journal for baseball (or sports in general) analysis?

No offense, but of the peer-reviewed journals I have read (and even done peer-review for), I was less than overwhelmed.  The best peer-review (for sabermetrics) is done by bloggers and readers of blogs.

I like the sound of wOBA, but if it’s not park-adjusted, I’m not sure I’d really be able to rely on it.

Agreed, which is why I support use of EqA or Palmer’s “BtnRuns” at Baseball-Reference.

Why not just translate into runs and have done? Also, I think the lack of park-adjustment is a serious flaw. I’d almost rather stick with OPS+.

This was answered in the main thread.  Palmer’s Linear Weights at Baseball-Reference has the batting runs, park-adjusted.  OPS+ should die sooner rather than later.

Enough already with another stat! We are acting like ivory tower intellectuals trying to analyze and objectify everything.

I take it this is a bad thing to do?

Also, since wOBA isn’t park-adjusted… isn’t the next step for someone to figure out how to do that? It was just introduced—nobody expects it to be perfect yet.

Right, exactly.

wOBA, huh? Every time I see this stat, or even think about it, I will always think of this:

http://www.youtube.com/watch?v=maYnqbdo2jw&feature=related

Busted. 

You are the first person to mention this to me.  I watched alot of Sesame Street with my boy, and that song was *definitely* a reason that I named it wOBA.  I wanted to give the name something light.  I used to call it lwtsOBA or lwtsOBP (for linear weights), and I was thinking of also wOBP.  But, wOBA worked for the name, and making it match to the song was my little secret.

...so scaling this stat to OBA just seems like a foolish effort. I can’t think why they didn’t scale it to batting average, except perhaps to avoid comparison to EqA, which would not be a good reason.

I hope now you know the real reason for the scale.

As to what’s already been probably said, wOBA looks like a poor man’s EqA.

It would be better described as an unadjusted, and far less complicated, EqA.

At its core, EqA is this:
(H + TB + 1.5*(W + HB + SB) + SF + SH)/(AB + W + HB + SH + SF + SB + CS)

If we can strip out all the extras, and focus on the big elements, we are left with this:
(H + TB + 1.5*W)/PA

So, the weights are:
1.5 BB
2.0 1B
3.0 2B
4.0 3B
5.0 HR

The denominator of wOBA is similar to EqA, so all we want to do is compare the numerators:
.72*BB + .90*1B + 1.24*2B + 1.56*3B + 1.95*HR

If we multiply my numbers by 2.4, we get:
1.7 BB
2.2 1B
3.0 2B
3.7 3B
4.7 HR

So, we see that “generally” they agree.

Clay does have some construction issues, and the conversion of EqR to EqA is fairly intensive.  I’ve spoken to him about it.  I think if Clay (and Bill James) were more involved with the grass-roots work that the sabermetric community is doing, we’d get closure on this pretty quickly.

...but reading the comments it appears to me that people who took the “OBA” part of the name seriously didn’t realize this. I think it’s too misleading.

It worked for what we needed.

I’ve never really understood the necessity of having one single metric that defines a player’s value.

You can’t fault something for not being what it wasn’t supposed to be.

...why sabermetricians don’t use a stat that would simply measure total bases per plate appearances? Something like TB + BB / AB + BB. Not quite Boswell’s total average, but similar.

We do.  We discussed it and it’s described here.

 

<< Back to main