THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Tangotiger Blog

<< Back to main

Friday, November 19, 2021

The New Cy Young Predictor?

The purpose of the Cy Young Predictor is to predict behaviour. Since the voters are human, naturally, behaviour changes. This is why the initial Cy Young predictor from Bill James worked so well. Up through 2005, it worked well, with the extra emphasis on relief pitchers and wins. But by 2009, the voting mindset had changed. That’s where my Cy Young Predictor picks it up, working spectacularly well from 2006 through 2020.

As a reminder, the model is ridiculously straightforward:

  • Cy Young Points = IP/2 - ER + SO/10 + W

I know it looks TOO straightforward, making the concept ridiculous. How is it possible to distill the 30 voters for each award down to such a simple rule? But then again, this is how well it looks going back to 2006:

Every single Cy Young winner finished 1st or 2nd in Cy Young points, since 2006, without exception. None. And even those who finished 2nd were within striking distance (4 points) of 1st place. So the rule is straightforward enough: if you lead the league in Cy Young points, or are within 4 points of the leader, you are the only candidates to win the Cy Young. This is true for the 30 awards from 2006-2020. 30 for 30.

2021 showed that the voting behaviour has changed. We are in the midst of a paradigm shift. Corbin Burnes was a whopping 11 points behind the leader and in 4th place. It’s clear: there’s a change.

The problem is that the NL voters and the AL voters are not aligned.

First, let me introduce the new Cy Young predictor, then I’ll explain why we are still in a haze as to where we are in terms of predicting behaviour. To the original formula:

  • Cy Young Points = IP/2 - ER + SO/10 + W

We add this:

  • IP/2 - FIPruns

What is FIP runs? We can reverse engineer ER from ERA as follows: ER = ERA/9 * IP. Similarly, we reverse engineer FIPruns as FIP/9 * IP.

If we do that, what do we get? In the NL, it looks like this (with Cy Young finish in paren):

  • 126 Burnes (1)
  • 125 Wheeler (2)
  • 116 Buehler (4)
  • 110 Scherzer (3)
  • 105 Gausman (6)
  • 101 Urias (7/8)
  • 99 Woodruff (5)
  • 90 deGrom (9)
  • 88 Wainwright (7/8)

We get the photo-finish for the top two, don’t do so well with 3/4 and Woodruff is a problem. The Buehler/Scherzer is not really correctable. And any attempt at correcting Woodruff impacts Wainwright. But the key is the top 2, and so, this does well enough. In the AL however:

  • 98 Cole (2)
  • 91 Ray (1)
  • 90 Rodon (5)
  • 81 Lynn (3) / Eovaldi (4) / Montas (6)
  • 75 Berrios (8)
  • 73 McCullers (7)
  • 72 Bassitt (9)

The issue is Cole v Ray, and Eovaldi v Rodon.

  • Ray has a substantial lead over Cole in ERA, while Cole has an even larger lead over Ray in FIP. Clearly therefore, FIP doesn’t hold as much weight.
  • On the other hand, Eovaldi has a large lead over Rodon in FIP, while Rodon has a much larger lead over Eovaldi in ERA. Clearly therefore, ERA doesn’t hold as much weight.

The issue is that we are faced with the same voters! And unlike with the photo-finish of the NL race, in the AL, Robbie Ray ran away with it. In other words, other than the showing of Eovaldi (and his AL lead in FIP), the Cy Predictor worked well enough. We can discount Rodon easily enough: he had only 132 IP, so voters clearly have extra penalties for that. But we can’t handle Eovaldi without also affecting Cole/Ray.

So, we are in a transition period here. In 2022, it’s possible the old Cy Predictor will work well enough. After all, Burnes went on an historic FIP run. And even if the voters didn’t cite FIP specifically, they’d cite the components of the “triple crown” of SO, BB, HR. And so, maybe we can explain everything as an anomoly here. We had the historic FIP run, we had Eovaldi having a huge gap between ERA and FIP, and we had Rodon with only 132 IP.

At this point, we have two models. Just as both the Bill James version and my version produced similar results during the transition period of 2006-2008, it’s possible we’re in a transition period as voters are figuring things out.

We’ll see in 2022 how things look.


(1) Comments • 2023/07/27

<< Back to main