Baseball-Reference at 20: After spawning a 1B page-view business, what’s next for the website of record?

Baseball-Reference at 20: After spawning a 1B page-view business, what’s next for the website of record?
By Matt Gelb
Jan 2, 2020

Two computer screens lit the corner office on the third floor at Summit Presbyterian Church, and this is where Sean Forman could see it all. His house, in the Mount Airy section of Philadelphia, is two blocks away. The grocery store is just around the corner. The coffee shop is across the street from that. Gary Sheffield’s Baseball-Reference page glowed from one of the monitors.

Advertisement

The radiator rumbled underneath his office, and a clicking sound filled the room. “I very much enjoy what I do,” said Forman, 48. “I feel pretty satisfied with where we’re at right now.” Soon, his little idea will turn 20 years old. This hobby became something indispensable in the baseball world, Baseball-Reference.com, and then grew into a business that generated more than a billion annual page views. Twenty years.

“This,” Forman said as he began a tour of the unassuming office space, “is my daughter’s cubicle.” Elinore is 11, and her job is to sell the company’s extra books — the random resources collected over the years that have helped to augment the digital encyclopedia. She lists the books online, boxes them up and takes them to the mailbox. All for a small cut, of course.

The company, Sports Reference LLC, now has 11 full-time employees to oversee seven websites. Improved mobile traffic and ad revenue allowed Forman to expand the operation. Often, he will ask a new acquaintance how many people they think work there. (He did, until a few years ago, serve as the office janitor.) The guesses have ranged from “half a person” to 50 employees.

“People are very surprised to find out just how small it is,” said Mike Lynch, a product manager. “It really is staggering how one man built this site.”

Forman quibbles with that notion because Baseball-Reference was built with open-source data — like Sean Lahman’s databases, Dave Smith’s tireless efforts at Retrosheet, and Sean Smith’s theories that spawned WAR.

Other websites offer more cutting-edge metrics and deeper analysis. Baseball-Reference has steered clear of editorial content. But as it embarks on its third decade, Forman’s site embodies a maxim since disrupted in the baseball research community. There is more data than ever. But teams are more protective of it than ever. It’s harder and harder to provide context through public-facing numbers.

Advertisement

Over the years, the statistics tracked and those most valued have changed. Typically, it came from a shared sense of discovery. “Everyone was very public about it,” Lynch said. “They could get feedback.” What if the newer, league-owned data become the predominant way to talk about the game?

Providing context, Forman said, has always been Baseball-Reference’s mission — from OPS+ to Similarity Scores to WAR calculations. It isn’t the prettiest website. But the business is profitable, Forman said, even if those margins are dependent on the whims of advertising algorithms. The stability offers the website’s founder some creative leeway in the third decade.

“People don’t appreciate the complexity and genius of what Sean has done,” said Lahman, whose databases were the roots for the site. “Because it looks so simple. It seems so intuitive. It’s remarkable. And that is why, for 20 years, it was the go-to site and it stayed there. Nobody has surpassed that. I don’t think anybody has even equaled it. It’s just been remarkable.”


“I feel pretty satisfied with where we’re at right now,” Sean Forman said. (Matt Gelb / The Athletic)

When Forman launched Baseball-Reference in April 2000, Lahman worked for a company called Total Sports that published 2,500-page stat books every year. The new website rendered them obsolete.

Twenty years later, Lahman still marvels at how Forman did it — how he manipulated the early world wide web and created something that has lasted. The rabbit holes that were so novel in 2003 still entice users to get lost in the numbers as they settle a debate or remember some former players.

“There were other folks who had tried to make online baseball encyclopedias before,” Lahman said. “What Sean saw that nobody else saw was you had to do more than just take that print volume and put it up as webpages. Sean understood that the web offered you an opportunity — not just to put up these pages but to organize the data in a different way. From the very beginning, the site really invited you to explore. Every page that you clicked on was aggressively hyperlinked. When you go to Henry Aaron’s page, you are a click away from seeing who else was on that 1966 Braves team with him. When he had all of those home runs, who were the guys ahead of him getting on base? It invited that kind of exploration in ways that just wasn’t possible on the printed page.

“He saw that. I’m not sure that that has ever been properly appreciated,” Lahman said. “He really saw it as a new medium and took advantage of what the technology offered.”

Like the row-summing function. It is simple, understated and not found anywhere else. “I’m proud of that one myself,” Forman said. And it was born from a source of inspiration that was as powerful 20 years ago as it is now: an internet debate.

“I’m on Usenet,” Forman said, “or I’m arguing with someone on Baseball Think Factory about something. ‘I want to see the last three years.’ Well, I have to get out my calculator. I have to add up the total bases. I have to add up the hits. I have to add up the at-bats. It became very clear that if you could do that in a relatively automated fashion, that would be a really nice feature to have.”

Advertisement

That comes back to the site’s foundation: Forman’s passion for baseball and statistics.

“Yeah, it was all stuff I was doing in my free time anyway,” Forman said. “The main thing that got me into it was a hyper-competitive desire to win my fantasy baseball leagues that I was in. I started out by creating prospect rankings of players and posting them on rec.sport.baseball. It’s all built upon itself.”


An ode to the past at Sports Reference’s offices. (Matt Gelb / The Athletic)

A slice of validation arrived via a recent message. Much of Baseball-Reference was created while Forman was a graduate student at the University of Iowa. He earned his Ph.D. in applied math but said it probably took a year longer than it should have. His academic advisor was skeptical of the baseball hobby.

That same advisor invited Forman to return to Iowa this spring as a speaker for a student computing conference.

While the site was still a side pursuit, Forman was a professor at St. Joseph’s University. He achieved tenure. Then, in 2006, he quit. “I was pretty confident I could at least make a living for myself doing it,” Forman said. “I didn’t necessarily envision it being 11 people and seven sports.” About a year ago, he splurged for a banner to hang on one of the church’s high third-floor walls. “ONE BILLION SERVED 2018,” it read. There will be another for 2019; Forman said the company’s websites neared 1.25 billion page views for the year.

It is a business, but Forman has a promise to keep — one based on the open data that spurred the site.

Baseball-Reference is the website of record for professional baseball. It is close to having all major-league box scores and game logs dating to 1901. The site has expanded its minor-league and collegiate databases. There are certain responsibilities.

“The No. 1 email we get now is, we have such a vast store of minor-league data, and families send us biographical updates,” Forman said. “Like, constantly. The monetary value of getting some 1950s I-League player’s handedness correct is literally less than zero. The amount of time we spend on it will cost more than any value we get out of it.

Advertisement

“But we are a reference site. We want to get it right. It’s important to this person and this family. So, we’ve been creating a system for streamlining that process.”

Lynch wants to add live, clutter-free box scores. They are revamping the site’s powerful Play Index tool to make it more intuitive and expansive. It’ll be rebranded under the Stathead brand. The company wants to sell more subscriptions, and it will begin charging for the search tools on the other Sports Reference sites. Forman became obsessed with soccer — he’s a Manchester City supporter — and the company’s ambitious soccer site is a priority. It has a global audience but mustered less traffic than the Mike Trout page on Baseball-Reference until recently, Forman said.

Meanwhile, he has continued the push for more open data to incorporate into the Baseball-Reference universe.

“If they would just release two- or three-years-old data for Statcast, that would be interesting,” Forman said. “You could use that to validate the current metrics we’re using. It would be useful.”

Lahman, who now works as a data reporter for the Rochester Democrat & Chronicle, sees this as essential to the next decade for Baseball-Reference’s relevancy.

“When I look at what Bill James was doing in the early ’80s, his work was a call for more and better data,” Lahman said. “We had that in the ’90s. Having that open data helped to give birth to groups like Baseball Prospectus and FanGraphs. They took that data and did interesting things with it. Those were the guys who forced the teams (to say), ‘Hey, there’s a competitive advantage to be had here if we just look at this.’

“On the other side, the piece that is missing right now — and I expect Sean Forman will be one of the people who resolves this — is how do we take that data and make it approachable for fans? I hear people say to me, ‘Why do I care about spin rates or exit velocities or launch angles? If I’m sitting at home watching the game, why do I care about that?’ And I don’t think anybody has really done a great job of explaining those things to people. Why they should care or, more importantly, what does that information say? The Baseball Savant site is really good at taking a first step toward that. But it still lacks that context.”

Advertisement

Forman said he wants to avoid a situation where, in 10 years, the company is sold to someone who doesn’t have the same values. He considers the stewards of Wikipedia and Internet Archive as “kindred spirits.” He’s convinced other like-minded programmers and developers to join him. At 20 years, there is a sense of permanence.

“As long as I can make a comfortable living — and I’m doing so — I feel pretty good about where we’re at,” Forman said. “I think our mission is more important than that. Fortunately, the other owners of the company are fine with not squeezing every last dollar out of it. We don’t have any venture capitalists breathing down our necks, either. We are left alone to do what we want and keep at it.”

(Top photo of Sean Forman at Sports Reference’s offices: Matt Gelb / The Athletic) 

Get all-access to exclusive stories.

Subscribe to The Athletic for in-depth coverage of your favorite players, teams, leagues and clubs. Try a week on us.

Matt Gelb

Matt Gelb is a senior writer for The Athletic covering the Philadelphia Phillies. He has covered the team since 2010 while at The Philadelphia Inquirer, including a yearlong pause from baseball as a reporter on the city desk. He is a graduate of Syracuse University and Central Bucks High School West.