Monday, November 24, 2008
wOBA year-by-year calculations
Here is the full specs for calculating wOBA for each season, based on the database from the Baseball Databank:
=======
Step 1
=======
Create a VIEW (or QUERY) named LeagueRunsPerOut based on this SQL:
SELECT
Pitching.yearID
, Sum([R])/Sum([IPouts]) AS RperOut
, Sum(Pitching.R) AS totR
, Sum(Pitching.IPouts) AS totOuts
FROM
PrimPos
INNER JOIN
Pitching
ON PrimPos.yearID = Pitching.yearID
AND PrimPos.playerID = Pitching.playerID
WHERE PrimPos.PosPrim=“P”
GROUP BY Pitching.yearID
;
The purpose here is simply to create a run environment for each season. I exclude all nonpitcher’s pitching numbers.
=======
Step 2
=======
Create a VIEW (or QUERY) named RunValues based on this SQL:
SELECT
Batting.yearID
, RperOut
, [RperOut]+0.14 AS runBB
, [runBB]+0.025 AS runHB
, [runBB]+0.155 AS run1B
, [run1B]+0.3 AS run2B
, [run2B]+0.27 AS run3B
, 1.4 AS runHR
, 0.2 AS runSB
, 2*[RperOut]+0.075 AS runCS
FROM
LeagueRunsPerOut
INNER JOIN
(
Batting
INNER JOIN
PrimPos
ON (Batting.yearID = PrimPos.yearID)
AND (Batting.playerID = PrimPos.playerID)
)
ON LeagueRunsPerOut.yearID = Batting.yearID
WHERE PrimPos.PosPrim <> "P"
GROUP BY
Batting.yearID
, RperOut
;
(The “FROM” clause can be rewritten clearer, but then it won’t work in Access.)
I set the run value of the walk as +.14 runs above the value of runs per out. While it is not necessarily exactly that all the time, it’s basically that for various run environments in MLB over the last fifty years. You can see the evidence here:
http://www.insidethebook.com/ee/index.php/site/article/linear_weights_by_run_environment/
In each run environment, the difference between the run value of the walk and runs per out (or RperI divided by 3) is between .134 and .143. Close enough for us.
The other batting run values work similarly. They are further double-checked here:
http://www.insidethebook.com/ee/index.php/site/comments/actual_wins_retrosheet_years/#4
The run value of the SB is fixed at .20, and the CS is set with a bit of a fudge, but works fairly well.
So, that’s we have the Linear Weights values for each event, for each season.
You could do the same thing with BaseRuns. I chose not to, only for simplicity’s sake. You could try to do it yourself.
=======
Step 3
=======
Create a VIEW (or QUERY) named RunValues2 based on this SQL:
SELECT
RunValues.yearID
, RunValues.RperOut
, RunValues.runBB
, RunValues.runHB
, RunValues.run1B
, RunValues.run2B
, RunValues.run3B
, RunValues.runHR
, RunValues.runSB
, RunValues.runCS
, Sum([runBB]*([BB]-nz([ibb]))+[runHB]*nz([HBP])+[run1B]*([H]-[2b]-[3b]-[HR])+[run2B]*[2b]+[run3B]*[3b]+1.4*[HR]+[runSB]*nz([SB])-[runCS]*nz([CS]))/Sum([ab]-[h]+nz([SF])) AS runMinus
, Sum([runBB]*([BB]-nz([ibb]))+[runHB]*nz([HBP])+[run1B]*([H]-[2b]-[3b]-[HR])+[run2B]*[2b]+[run3B]*[3b]+1.4*[HR]+[runSB]*nz([SB])-[runCS]*nz([CS]))/Sum([BB]-nz([IBB])+nz([HBP])+[H]) AS runPlus
, Sum([BB]-nz([IBB])+nz([HBP])+[H])/Sum([AB]+[BB]-nz([IBB])+nz([HBP])+nz([SF])) AS wOBA
, 1/([runPlus]+[runMinus]) AS wOBAscale
, ([runBB]+[runMinus])*[wOBAscale] AS wobaBB
, ([runHB]+[runMinus])*[wOBAscale] AS wobaHB
, ([run1B]+[runMinus])*[wOBAscale] AS woba1B
, ([run2B]+[runMinus])*[wOBAscale] AS woba2B
, ([run3B]+[runMinus])*[wOBAscale] AS woba3B
, ([runHR]+[runMinus])*[wOBAscale] AS wobaHR
, [runSB]*[wOBAscale] AS wobaSB
, [runCS]*[wOBAscale] AS wobaCS
FROM
RunValues
INNER JOIN
(
Batting
INNER JOIN
PrimPos
ON Batting.playerID = PrimPos.playerID
AND Batting.yearID = PrimPos.yearID
)
ON RunValues.yearID = Batting.yearID
GROUP BY
RunValues.yearID
, RunValues.RperOut
, RunValues.runBB
, RunValues.runHB
, RunValues.run1B
, RunValues.run2B
, RunValues.run3B
, RunValues.runHR
, RunValues.runSB
, RunValues.runCS
ORDER BY
RunValues.yearID DESC
;
I could have merged these last two VIEWS, but that’s not important.
Notes:
- runMinus sets the run value for the missing events, which is AB minus H plus SF; if we had reached base on error, then we’d update these last two views accordingly; it is basically the run value of the batting out
- runPlus determines the average run value of the safe batting events (walks, hitbatters, hits)
- there is a wOBA calculation, which you will see is actually an OBP calculation; they are interchangeable at the league level
- the wOBAscale is the multiplier that we will be applying to get the run values into a wOBA scale; it also lets you convert from wOBA to runs per PA (while The Book says to use 1.15, you actually use whatever this value is for the season in question)
- for all the batting events, we take the run value of each event, add in the run value of the outs, and then multiply by the wOBAscale factor; play around with why and how I am doing this to see if this makes sense to you; if this makes total sense to you, then make a short post for your fellow readers; otherwise, I’ll have to make a long boring post to that effect
- for the running events, we only apply the multiplier; once you understand the previous point, you will understand the reason for this point
***
Since 1956, the weighted average is:
wobaBB 0.71
wobaHB 0.74
woba1B 0.90
woba2B 1.28
woba3B 1.63
wobaHR 2.10
wobaSB 0.25
wobaCS 0.51
Remember, this is what is used for the Baseball Databank. If you use different events, like Reaching base on error for example, things will change a bit. It also depends whether you do, or do not, want to include SB/CS. In The Book, we were almost always interested only in the batter/pitcher matchup, and so, the SB/CS numbers would not make sense. Here is the full output of the above:
http://tangotiger.net/bdb/lwts_woba_for_bdb.txt
Anyway, I hope this makes the entire LWTS ~ wOBA relationship clearer. If not as clear-as-crystal, more than clear-as-mud.
Wow… that’s really cool. Thanks for sharing the results of a lot of hard work. Very generous. I can’t wait to try it myself when I get some time.