A few weeks ago, I was talking about the motivations of crowdsourcing workers with Judd, who has already done a ton of great work looking at motivations for participation across a wide range of online environments. He is a recent Ph.D. from the UC Berkeley School of Information and just joined Yahoo! Research as a social psychologist and research scientist in the Internet Experiences Group, so it was no surprise that he had a great idea about how to design an experiment to better understand crowdsourcing.
The most straightforward way to ask crowdsourcing workers why they do what they do is with a survey (e.g., Panos Ipeirotis’ fascinating recent informal survey of MTurk workers.) However, you also might recall from one or two of my previous posts that I tend not to take survey results at face value.
Judd’s “list experiment” presents the subjects of a study with a list of several motivations and asks them to provide a count of the number of items in the list they agree with (rather than posing yes/no questions or checkboxes).
Here’s what that looked like once Judd had it set up in Crowdflower:
We presented experimental treatment groups with four other permutations of the same list — each one missing one of the items — and aggregated the results across every group. This allowed us to estimate the proportion of respondents choosing each item in the list.
The advantage of the list experiment over the traditional survey format is that it doesn’t require anybody to explicitly say, “I crowdsource because it gives me a sense of purpose.” Indeed, it perfectly preserves the anonymity of individual user preferences, since the results that we generate are estimates based on summaries of behavior across the different treatment groups. The questions are less obtrusive and there’s no pressure to hide your true sentiments or conform to the expectations of others. List experiments are thus amazing tools to examine preferences that may be controversial or otherwise influenced by social pressures in some way.
Judd and I designed a pilot experiment with the list above and administered it to MTurk workers through Crowdflower. For the sake of comparison, we also included a control condition that asked Turkers the same questions in traditional, agreement-style survey form. To simplify things, we limited the responses to US workers only.
Comparing the results from the survey condition and the list experiment revealed some mind-blowing differences:
Note the discrepancy between some of the paired bars. Whereas 97% of the Turkers in the control group agreed with the statement “I am motivated to do HITs on Mechanical Turk to make extra money,” just 60% of the Turkers in the list experiment condition expressed the same preference.
Similarly, check out the difference between the agreement-style questions and list experiment results in the “for fun” category. Again, agreement statements elicit over-reporting when compared with the list experiment (although this time to a less extreme degree).
Our preliminary conclusions from this pilot study? The ideas of crowdsourcing for money and crowdsourcing for fun sound better than they actually are.
Another, slightly more science-y way to put this is that the workers in our study over-report the extent to which they are motivated by money and fun in response to agreement statements versus a list experiment, suggesting that they perceive these two factors to be socially desirable.
Understanding the cause of this social desirability bias as well as its implications for crowdsourcing across different environments will require further research. In other contexts, social desirability bias (a.k.a. “the Broadus effect”, if you read the amazing Nate Silver) has played a role in everything from elections to educational attainment. There’s no reason to believe it doesn’t affect the way people work and participate in various online environments as well.
Perhaps most interesting of all, our findings here further complicate the growing debate over how paid crowdsourcing ought to be understood and (potentially) regulated. If a substantial proportion of workers aren’t actually on MTurk for the money, does that support the claim that we should regulate crowdsourcing along the same lines that we regulate other post-industrial sectors?
These are big questions that we should continue to probe through future studies and discussion. In the meantime, Judd and I re-ran our list experiment with a few minor adjustments and a much bigger sample. We’re in the process of writing up this larger version of the study for a conference submission and will post the full paper here as soon as we can.