Research & Insights

By Bryan O'Rourke, September 5, 2013

Fantasies of Big Data and Baseball Part 3

In this three week series, the first week covered the distribution of crowd members who participated (click here). Last week we saw the point totals of each player that the average crowd member chose to start (click here). How did the classes do as a whole, though? What do these results mean?

Here is a breakdown of how much each knowledge class scored:

Point Totals By Team

Given there was so much overlap amongst players for each knowledge class, it is not surprising that all the classes scored very close to one another. What is surprising is the ‘Little Knowledge’ class barely beat out all other classes. Additionally, the ‘Highest Knowledge’ class scored the least out of all knowledge classes. This fantasy contest was won and lost by just a few picks, the players that were not commonly chosen across the board. So let’s look at the same chart above, only this time without the positions where players were chosen 4 or 5 times in total:

Fantasy Without Players

This graph depicts where ‘Little Knowledge’ separated itself from the rest of the pack (besides ‘Lowest Knowledge’, but the difference there was Andrew McCutchen being the wise choice as he had been very hot for awhile and Ellsbury had not been playing well. While most groups knew this, the ‘Lowest Knowledge’ group missed this keen observation). Where the higher knowledge levels chose players like Robinson Cano, Jose Bautista and Prince Fielder, the two lower knowledge classes chose Allen Craig, Matt Carpenter and Michael Cuddyer.

What are the differences between these sets of players? Robinson Cano, Jose Bautista and Prince Fielder are physical specimens with high profile contracts who crush the ball when they make contact. These players are frequently highlighted on SportsCenter on a nightly basis and get much of the spotlight in sports media. Allen Craig, Matt Carpenter and Michael Cuddyer are all very good players, but their trademark is consistency. These players slap base hits day in and day out. Most importantly, they were statistical leaders at their respective positions in things like Batting Average, Walks, and Runs Batted In and Runs Scored.

It seems that the contributors with very little prior knowledge simply looked up who was a statistical leader for a given position, and chose them to be on their fantasy team. It did not matter to those with less prior knowledge who got more press coverage, bigger contracts or highlight reel home runs. These contributors gave no weight to those concepts as they were not present, consciously or unconsciously, in the decision making process.

The transition of baseball owners and managers shift their mindset to be focused on more pure data than pure athleticism has been well documented. Books like ‘Moneyball’ and ‘The Signal And The Noise’, as well as movies and documentaries have made it common knowledge that statistics should be at the forefront of sports. Yet, what seems like common knowledge, hasn’t yet infiltrated the popular realm of fantasy baseball as much as we’d like to think it has. There will always be room for the hunch bet and taking chances on exciting new prospects. However, the vast majority of the time it is safe to say that the statistical leaders in each category will probably be the statistical leaders on a smaller time scale as well.

So next time you’re trying to decide on whether to start a hot new player or a consistent low profile player, think about this article, don’t make the same mistake I did and overanalyze or make unconsciously bias decisions. Keep it simple; those with the best meaningful statistics will perform the best. That strategy served me very well my first two years, I probably would have been wise to follow it the following two years, and I know from here on out I will be adhering to this rule of thumb in all future leagues.

Have any questions about my methodology? Perhaps, you wish you could see more or different statistics? Maybe I reached the wrong conclusion? Let me know by leaving a comment below!



  1. This was not a scientific study. It would not hold up to the rigor of multiple peer review by scientists and statisticians. Instead, it was meant to be an insight into a seemingly all-to-common error I have seen happen many times across all fantasy sports.
  2. Another possible explanation is that the ‘Little Knowledge’ class had by far the most participants. The old saying “the wisdom of the crowd” had much larger input here and could have come to a wiser decision because of that. It is very possible that if there were as many ‘High Knowledge’ contributors participating as ‘Little Knowledge’, that the ‘High Knowledge’ group would have come to an even wiser decision than the ‘Little Knowledge’ contributors.
  3. I’d like to thank CrowdFlower for giving me the opportunity to explore my curiosity with numbers, baseball and all fantasy sports.
  4. I’d like to thank Keith Favreau for help taking a general idea and mold it into a practical and interesting crowd experiment.