Research & Insights

By Bryan O'Rourke, August 14, 2013

Fantasies of Big Data and Baseball Part 1

This is part one of a three-part, weekly series. I will be exploring the relationship between big data, fantasy baseball and my own personal experiences with the rise and fall of a fantasy baseball dynasty. Part one will cover the fundamental make-up of who participated in this research. Part two will go over the structure of the CrowdFlower task and numerical results. Finally, part three will wrap up with a conclusion of what should be gleaned from this experiment.

Over the past several decades, baseball has evolved into a game dominated by statistics. This evolution has been marked by struggle and turmoil within the baseball community, but has led to an understanding of probability and statistics as being a pillar of success. For the most part, this new perspective on the game has seeped into fantasy baseball as well. After all, the very premise of fantasy baseball is to receive points based on a player’s statistics.

A few months ago, I began to wonder just how much data had become the prevailing wisdom in fantasy baseball. I remember when I first played fantasy, about six years ago, I dominated the league, winning two championships in a row. Eventually, I became very attached to the pride and glory of being the definitive number one amongst my friends. To maintain this alpha position, I began to hit the books hard. I watched every game I could while doing lots of research, pouring over countless webpages of stats, and reading blog after blog on fantasy advice. I did not make playoffs that year. Nor did I make playoffs the following year…

Fast forward a few years later. Now, I am an employee at a company that has millions of contributors accessing tasks that I help build everyday. I saw an opportunity to explore the correlation between fantasy baseball knowledge and performance, if any exists at all. So, I designed an experiment that split just under 1,000 members of the crowd into five groups of knowledge:

  1. Lowest Knowledge – Those who have not seen a baseball game in years and years and have no idea what is going on in the league.
  2. Little Knowledge- Those who have seen a couple baseball games in the past few years, but have little-to-no current knowledge of the game.
  3. Medium Knowledge – Those who have seen several baseball games this year and know a few injuries and stats about notable players.
  4. Higher Knowledge – Those who watch games several times a week and know most teams’ key players and statistics.
  5. Highest Knowledge – Those who watch a game almost everyday and know all key players’ injuries and stats.

Here is a graph of how many people fell into each category when they did this task:

Lowest had 225 contributors participate, Little had 370, Medium had 189, Higher had 126 and Highest had 88.

Next week, we will explore the structure of the task I designed as well as a glimpse into the players’ point totals.

Do you have any predictions as to who will win? Were you surprised by the turnout distribution? Leave your thoughts below as a comment!