Filed under:

Baseball Databank Data Dump, I: FIP Constants and Runs Per Win

Hi. I'm Matt. I write here (occasionally) now. 

Perhaps I could have picked something a bit more exciting for my first post, but I kept missing stuff, getting busy with other things, and so on. But I think this might be of utility to some people -- I know there were times I've wished I could easily access stuff like this. 

Like many people, I originally got into SQL databases for baseball thanks to a few posts by Colin Wyers at the sadly-departed Statistically Speaking blog (Colin has since posted new versions of his introductions at The Hardball Times.)

This isn't about SQL, though. It's about putting some data out there for people who want to use FIP scaled to a particular season or need a runs-to-win conversion but don't want to mess with the database stuff themselves. I'll also publish the queries down below for those curious (they are very simple), which I came up with for use in MySQl and the pitching table in the freely available Baseball Databank.


FIP Scaled Historically

You probably already know about FIP (Fielding Indepedent Pitching),  created by Tom Tango. (If not, click here for a good introduction), the simple and beautiful stat that gives us an idea of how well a pitcher performed independently of batted ball luck and the fielders behind him by focusing on things he can control. The basic formula is:

((13*HR)+(3*(BB+HBP-IBB))-(2*K))/IP+constant

Where the "constant" is usually around 3.2, but is adjusted for each season/entity to that the leagues season/entity ERA is the same as the season/entity FIP. Obviously, it's a pain to calculate each season individually, but that's why computers were invented. So I came up with an easy query for that. Not only that, but if you're like me and many others, you hate the "earned run" rule for various reasons I can't get into here. Not only that, but you realize that when, say, calculating Wins Above Replacement, using a stat scaled to ERA undervalues a pitcher relative to position players. So, instead of scaling FIP to ERA, why not to RA? Well, I did both with this query. And here are the results for easy bookmarking:

Historical FIP Scales, 1871-2009

The numbers in the "FIP-ERA" and "FIP-RA" columns are the "constants" as listed above that you need to use to scale them to the ERA/RA for the season. I have two separate sheets, one that groups all the major leagues by season, and the other that groups by season and league (i.e., separates AL and NL).

 

Runs Per Win

You can read the link in the heading if you aren't sure what this means. In the contemporary baseball era, you're pretty safe assuming that 10 runs = one win. But sometimes you want to be more precise, so I've come up with a table for that, too, comparing the results of three different runs per win estimators

  1. Simple Runs Per Game (or lgRA times two).
  2. Tom Tango's simple formula: lgRA*1.5+3
  3. Patriot's PythagenPat version: (league Runs Per Game^(1-z)) times two, where z is between 0.27 and 0.29. I use 0.287 because that's what I read originally and I'm too lazy to figure out if I should change it.

Here is the table:

Runs Per Win, 1871-2009

As noted in the first link in this section, the Pythagorean win expectation is the heart of the runs-to-win conversion, and since PythagenPat is the most accurate win expectation estimator across all contexts, so will its run to win conversion be the most accurate (although, again, you're pretty safe just going with 10 in a pinch). But I actually still use Tango's version in a lot of my SQL's because it's easier to work with quickly, and I'm not sure the loss in precision is worth all the effort, although I will eventually change all my SQLs over to PythagenPat. And, as you can see from the table, in the contemporary era there really isn't all that much of a difference, and even RPG isn't that far off.

I hope this is useful to someone out there, and if not, well, I hope I'm more interesting next time. I'll spare you the SQLs on the front page, but I do have them after the break for those that are interested.

These are for MySQL with the Baseball Databank. Both of these are the "group by league" versions, but if you take out the lgID in the "select" and "group" lines, it will group by all the major leagues of that year.

FIP Constants

SELECT p.yearID, p.lgid , SUM(COALESCE(IPOuts,0)/3) AS IP ,((SUM(COALESCE(ER,0))/SUM(COALESCE(IPOuts,0)/3))*9)AS lgERA , ,((SUM(COALESCE(ER,0))/SUM(COALESCE(IPOuts,0)/3))*9) - ((SUM(COALESCE(HR,0)*13)+(SUM(COALESCE(BB,0))+SUM(COALESCE(HBP,0))- SUM(COALESCE(IBB,0)))*3-SUM(COALESCE(SO,0))*2)/(SUM(COALESCE(IPOuts,0))/3)) AS FIPERScale , ,((SUM(COALESCE(R,0))/SUM(COALESCE(IPOuts,0)/3))*9) AS lgRA , ,((SUM(COALESCE(R,0))/SUM(COALESCE(IPOuts,0)/3))*9) - ((SUM(COALESCE(HR,0)*13)+(SUM(COALESCE(BB,0))+SUM(COALESCE(HBP,0))- SUM(COALESCE(IBB,0)))*3-SUM(COALESCE(SO,0))*2)/(SUM(COALESCE(IPOuts,0))/3)) AS FIPRAScale FROM bdb.pitching p GROUP BY p.yearID, p.lgid ORDER BY p.yearID DESC

Runs Per Win

 

SELECT yearID, lgID

, (((SUM(R)/SUM(IPOuts/27))*1.5)+3) AS RPWTango

, (SUM(R)/SUM(IPOuts/27))*2 AS RPG

, 2*(POW(SUM(R)/SUM(IPOuts/27)*2, 0.713)) PythPat

FROM bdb.pitching

GROUP BY yearID, lgID

ORDER BY yearID DESC

;