This American Data Set, Act I: Baseball Outliers

August 31, 2015 by Tyler Patterson

[Editors note: As a bunch of data geeks, we always enjoy getting our hands dirty exploring interesting data. This is the first of a three-part series on data sets with a story to tell. You can check out the source data for this here.]

There’s a reason that practically everything that happens in a baseball game is meticulously tracked. Interesting baseball stories are often captured beautifully in data.

For instance, a recent analysis has shown that hitters start losing their abilities as soon as their careers begin. Perhaps this is why it’s so rare and exciting to see a player who defies the typical aging curve and continues to play brilliantly far past his expected peak. These players rank among the most legendary in the game. Hank Aaron hit 40 home runs at age 39 – just seven below his career high. Ty Cobb notched the second-highest OPS of his career at 38. Stan Musial batted .330 at 41.

However, remarkable consistency isn’t the only way to subvert the aging curve. Every once in a while we see a player whose career goes off in a completely unexpected direction. To me, at least, the most fun—and bizarre—examples are the ones whose career follows a relatively predictable path, except for one absolutely bonkers season. The first name that comes to mind for this is Brady Anderson, whose career high for home runs in a season was 21 before he hit 50 in 1996. But this was no harbinger of things to come for ol’ Brady, though—he never managed more than 24 in a year after 1996.

With the strange success of Brady Anderson in mind, I thought it might be interesting to try to find some more of these bizarre outlier seasons.

In Search of the Anomalies

So who are the players whose peaks far outstripped their career totals?

Below are the players with the greatest positive difference between OPS in a single season and career OPS. In a rough attempt to approximate how much each player may have benefited from luck, I’ve also added the batting average on balls in play (BABIP) for both the season and his career. Here’s a great primer on BABIP if you’re not familiar.

The first thing you’ll probably notice is Barry Bonds’ absolute domination of this list. The man was so good that, despite possessing a top-5 OPS of all time, he still managed to also attain the largest difference between OPS in a season and career OPS, as well as a few other spots in the top 10 for good measure. If you needed any more proof that Bonds’ 2001-2004 was the most dominating stretch for any hitter ever, well, there it is.

Another striking thing is how many 19th-century players hold spots on this list. Despite less than a tenth of the seasons above my plate appearance threshold of 350 coming before 1900, 7 of the top 20 differences come from this time period. These players also tend to have extremely high BABIP gaps between season and career. It seems there was simply more variance in the game in the 1800s. With a young league and shorter seasons, a player would have a greater chance of getting lucky in a given year.

Baseball outliers internal image

Analyzing the Outliers

Are these spikes in output largely skill-based, or is there an element of luck in these outlier seasons? One way to consider this question is to test how often a higher-than-normal OPS is accompanied by a higher-than-normal BABIP.

Of the 1652 seasons in which players achieved an OPS more than .100 higher than their career OPS, they also had a higher BABIP in 1546 of them — well over 90 percent. Since BABIP can be higher or lower than a player’s average in a season out of sheer luck, this points to fortune playing a big role in many—though probably not all—of these seasons. Some players could also have experienced an improvement in skill (such as the ability to hit more line drives) for their outlier season but been unable to maintain this skill throughout their careers.

If that sounds complex, here’s a more common sense perspective. A player who greatly outperforms his career averages but cannot sustain this performance probably had at least a few extra balls bounce his way over the course of his anomaly season. Maybe he really did figure something out and then lose it the next year, but luck probably played at least a small role in a majority of cases.

What about Poor Performers?

Just for fun, let’s also take a look at the very worst seasons of all time compared to typical career performance. These are the players that either figured something out after a season or two in the league, fell off a cliff late in their career, or just flat-out stunk for a year and then went back to normal, leaving the rest of the league scratching their heads.

Oh, look who’s at the top of the list again: Barry freaking Bonds. The most amazing thing about this list is that Bonds’ OPS of .746 was actually above average considering the historical context and the park he played in. Let that sink in. The most uncharacteristically bad offensive season of all time was still a pretty good offensive season. Another amazing Barry Bonds fact to add to the register.

A few other interesting things to note: there aren’t nearly as many players from the 1800s on this list. This may just be because of sample size, of course, but I wonder if teams just didn’t give players a chance to play an entire year much worse than they normally did. Also note that most of these players have BABIPs much lower than in their overall career. It’s fairly unlikely that a player will have a dreadful season and go back to typical performance purely due to changes in skill, so it makes sense that some luck would be involved.

It Ain’t Over ‘Til…

So there you have it — some of the biggest season outliers of all time. It’s interesting the sort of mix you get, from average players having surprise career years to long-forgotten 19th century players whose bat gave them one magical season. Sadly, our man Brady Anderson was just a little too good of a player overall, and his fluke season not quite impressive enough, to make a showing on this list, though the dramatic shift in home runs is unmatched to this day.

The real moral of this story, though? Don’t mess with Barry Bonds. He will top any batting list you could possibly think of. Abandon all hope. He is everywhere.

All data is taken from the Lahman database. Go here to access the full database, or here to interact with key Lahman tables as well as modified tables used in the article.


Feature image courtesy of Connie Ma.