Predicting Hall of Famers with Machine Learning
The questions of who should and who will make it into the Baseball Hall of Fame have inspired countless debates, books, articles, and statistics. From the early days of statistical milestones like 3,000 hits, 500 home runs, and 300 wins to more advanced measurements like WAR and JAWS, and throughout baseball’s many eras, many have attempted to tackle the task. The discussion is more or less ongoing but peaks whenever a prominent player retires and during every winter ballot season. Innovations like the Hall of Fame Tracker have only added fuel to the fire.
I wanted to see if machine learning was up to the task of predicting who’ll get enshrined. I trained and evaluated a prediction model and used it to predict induction chances for current and recently retired players. I specifically wanted to see if I could get a sense of how some of the game’s younger superstars are doing, because who doesn’t want to talk about how good Juan Soto is?
In this article I discuss building and evaluating the model and show the predictions it makes. If you’re interested in the former, continue reading; if you’re interested only in the predictions, feel free to skip to the end. Read the rest of this entry »