Wednesday, July 07, 2021
Statcast Lab: Distance/Time Model to Taking/Holding Extra Base
We have finally completed the Baserunning and Arm model of taking/holding the extra base. The model is intuitive and matches how a baseball fan processes a play. Let’s take the situation of a batter deciding whether to stretch a single into a double. The runner at contact has a certain number of seconds to reach second base. The fielder at contact has a certain number of seconds to retrieving the ball, and then based on the distance of the throw, a certain number of seconds to getting the ball to second base. The more time it takes the fielder to get the ball in, the higher the probability the runner will try for two (and succeed). The less time it takes the fielder to get the ball in, the lower the probability the runner will try for two (and if he tries, the more likely he will get thrown out).
So suppose the runner needs 8 seconds to get to second base for a particular play. On that same play, let’s say the defense needs 5 seconds to get to the ball, and another 0.75 seconds to release it, another 2 seconds for the ball to travel and another 0.5 seconds to apply the tag. That’s 8.25 seconds. We take 8.25 seconds of fielder time minus 8.0 seconds of runner time, and we have a delta of 0.25 seconds. That’s how much “breathing room” the runner has. Naturally, the runner in question won’t ALWAYS take exactly 8.0 seconds. Sometimes it may be 7.8 or 8.3. Or anything in-between. As for the defense, the fielder might get to the ball quicker than normal or slower than normal. His throw might not reach its peak, and maybe the throw is a bit offline so we need more tag time.
What we have is therefore an S-curve type of probability (a sigmoid function). The more negative, the lower the probability. The more positive, the higher the probability. This is what it looks like for the batter trying for two.
Here we see that when the Fielder Time and the Runner Time matches, the runner will try and be successful about one-third of the time. The more buffer time for the runner, the more often he will try and succeed. The blue line is actual data, while the dashed line is the model.
We’ve identified six different baserunning / arm categories:
- Batter going for two
- Batter going for three
- Runner going first to third on a single
- Runner going first to home on a double
- Runner going second to home on a single
- Runner going third to home on a sac fly
Each type of play has its own model, though they all follow the same principle. The “slope” of the curve is unique to each kind of play, but the structure of the model is the same.
With each play having a probability, we can now compare each runner to the baseline, and figure out how many extra bases they are taking, or how many bases they are NOT taking. As well as how often they are being thrown out. We can combine all that and come up with leaderboards for runners. Here it is since 2016:
Mookie Betts, Kevin Kiermaier, and Billy Hamilton are the best at taking the extra base.
We can also flip it on the other side, and look at leaderboards from the OF Arm perspective, crediting the outfielder not only for throwing runners out, but also holding them to their base. Here’s that leaderboard (more negative is good for the defense) here. Kevin Kiermaier is the leader (as well as Betts and Hamilton also with a strong showing).
You know all those things they say that’s “not in the boxscore”. It’s in the Statcast boxscore, and we’ll be showing the results of that, and shine that spotlight on that “hidden game” of baseball, with Kiermaier its best representative, both as a runner and as a thrower. (We already know that Kiermaier is tremendous as a fielder.) Kiermaier is the kind of player that Statcast does its best to highlight. Eventually, this will make its way to Savant, along with alot more breakdowns, so you can see it by each category and each season.
And what more can we do? Well, plenty. This for example is an Altuve play where he was thrown out by Arozarena. If you are behind the red line, that’s the nogo line. If you are ahead of the green line, that’s the go line. In this particular play, Altuve was thrown out. And we’d be able to show it, frame by frame, in video mode. We’re entering the top of the 5th.
There are a number of assumptions that I assume you are making that weren’t addressed, so let me throw them out to make it explicit.
1) For runners, you need to subset only to plays where the runner is at full effort on the play - to determine that this runner needs 8 seconds (on average) from home to second, you exclude the “easy stand-up” doubles.
2) Same idea for fielders - using only max effort plays and throws.
Now, given that you are using the runner’s own ability for determining the go/no-go line, the metric is really measuring baserunner intelligence moreso than baserunner value, in the WAR sense (and same for the fielder). If we wanted to determine how much more value a player is adding from the ability to take extra bases, you should determine the time for the MLB average baserunner (probably split by hitting side) to go from home to second, and using the specific fielder on the play determine the probability of an average runner making it to second.
On a given play, Billy Hamilton may have an 80% chance of making it to 2nd, whereas Miguel Cabrera may have a 15% chance. As I understand it, this metric is going to credit Hamilton with 20% added value of taking the extra base (assuming he goes for it and is successful). Miguel Cabrera would get 85% added value if he were to go and be successful. This is another assumption (not explicit in the article) about how you are tallying value that I am assuming. As I said above, this would be a good metric for measuring baserunner intelligence perhaps, but not actual value added. If Hamilton and Miggy both went for it and were successful, in a value metric they should both be credited with the same value added - they both made it to second base on an identical play; same batted ball, same fielder, same park (same batter handedness also let’s say).
Similarly for the fielder, we should not use his own running/throwing abilities to determine the go/no-go line for the runner, we should use the values of the average fielder at his position.
If I’ve misunderstood then my apologies. I wonder how much the leaderboard would change, it may not make that much of a difference.