Research & Insights

By Patrick Philips, March 4, 2011

Oscar Fever: The Sequel!

The votes are in from our Oscar crowdsourcing experiment, and the crowd successfully picked the winners of 14 of the academy awards. For reference, Roger Ebert got 15 predictions correct so we’d have to conclude that the crowd performed reasonably well at predicting the winners of this glorified popularity contest.


movie picks

One fascinating thing about aggregating responses is that the crowd as a whole will often outperform the average worker. In this case, among the 500 people we polled, the majority of respondents picked fewer than 10 awards correctly (mean of 9.6 and median of 9). And yet, by aggregating all the responses, such that the nominee with the most “votes” is predicted to win, the crowd as a whole correctly picked 14 awards. While the “wisdom of crowds” doesn’t come as much of a surprise, it’s always reassuring to see it confirmed in new applications.

As we noted in our earlier post, though, the more interesting question was whether workers who indicated higher confidence in their responses would outperform workers with lower confidence. Looking at the results, however, we saw no significant correlation between a worker’s predicted accuracy and actual performance.


While it’s certainly possible that we didn’t offer enough of an incentive for workers to estimate their own accuracy, the more likely explanation is that predicting the winners of the Oscars is not something that a person can do with any degree of certainty. Confident or not, the people we polled did not see the “Inside Job” coming.

As a final exercise, we ran a regression on every explanatory variable we could find, including what state workers came from, what day they made their predictions, whether they made their predictions during the day or at night, how long they spent making their predictions and even their historical accuracy on other CrowdFlower tasks. The only variable with any significance turned out to be how long they spent on making their predictions, and while it was significant (at p=0.001), no model we could come up with explained more than 5 percent of the total variation in accuracy.

While the wisdom of crowds seems to extend to picking Oscar winners, the more interesting experiment of having workers self-select as trustworthy is ongoing. In the future, it would be worthwhile to repeat this experiment with questions that can be answered objectively and without uncertainty (solving algebra problems seems like a good candidate), to see if any correlation emerges between predicted and actual accuracy.