Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets

by /

This is a guest post written by my friend Dana Chandler on how the context of a task motivates the person working on it. He has a longer academic paper on the topic you can find at the bottom of this post. It once again shows how traditional economic incentives can't fully explain workers' behaviors on Mechanical Turk.

Imagine for a moment that you were a turker from either the US or India, looking at the above image. You are given the task of clicking on the blue circular objects with red borders. What you see is only a fraction of the full image. Each image has 90 blue objects to identify. If you’re as good as the average worker, you’ll complete your first image in a little over five minutes and you’ll earn 10 cents (for an hourly wage of $1.20).

After your first image, you can either quit and take your 10 cents, or identify points on another image. Over the next four hours, you’ll have the chance to label as many images you want. But there’s a catch—you’ll only be paid 9 cents for the second image, 8 cents for the third, and so on, all the way down to 2 cents. This will lower the hourly wage even more.

Before you even qualify for the task, you'll have to spend five minutes watching a training video and passing a quiz. During the video, half of you will be given only basic work instructions on how to identify “objects of interest.” The other half will be given both instructions and cues of meaning: recognition for your contribution and an explanation of your task's purpose1. The reason given here? To help researchers identify cancerous tumor cells.

We posted these HITs on MTurk in January, 2010. Almost 300 people from the U.S. and India accepted the task, becoming unknowing participants in our experiment examining MTurk worker motivation. It is commonly believed (and other researchers have verified with demographic surveys) that Indian workers are more motivated by pecuniary concerns and that US turkers are primarily doing tasks for leisure or other non-pecuniary motives. Is this true?

In both countries, half of the turkers in the experiment were randomly assigned to label nondescript "objects of interest" without being given any context or greater purpose -- they were our zero-context group. The other half, our meaningful group, were told they were helping researchers identify cancerous tumor cells. Which group of turkers do you think worked harder? You might be surprised.

Therefore, our experiment compared two groups with and without a clear wage motivation, to see if workers behave differently responded to meaningfulness in their tasks.

Results

We measured three metrics: "showing up", the quantity of work, and the quality of that work. The first two metrics are straightforward. Showing up meant that you sat through our training video, passed our qualification test and helped label at least one image. Quantity of work was simply the number of images labeled.

We repeatedly told both groups of turkers that they needed to click on all points and as closely as possible to each point. Work quality was determined by the fraction of cells that a person clicked on (the recall) and the average distance between the “true center” of each cell and where the user clicked (the centrality).

Our most interesting finding was the extent to which a meaningful task (and giving recognition) motivated US workers, but not Indian workers, to complete a task. As any requester knows, attrition on MTurk is a real problem. We found that adding cues of meaning could motivate turkers to undergo training and label at least one image. In the US, adding cues-of-meaning raised the fraction of turkers who completed our task from, 92% of people who sat through our training video, took our quiz, and labeled an image showed up. This figure compares to only 83% of zero-context group (see figure which also has standard errors). In India, there was no difference between the groups and both groups had a 66% completion rate (attrition being higher due to possible language barriers, slow connection speeds, hardware issues, etc.).

However, once a person did some work, both treatment and control groups did a similar quantity of work: The cues-of-meaning group labeled 6.0 images and the zero-context group labeled 5.7 images. This difference was not statistically significant, so it suggests that once you get turkers to work on a task, they are motivated to label just as many images irrespective of the task’s meaningfulness. Notably, of the people who worked, Indians worked longer and labeled an average of 7.3 images vs. 5.2 in the US.

Surprisingly, all workers did an equally good job identifying points whether they had zero-context or whether they thought they were identifying tumor cells. The quality as measured by the fraction of points identified (the recall) or the average pixel distance (the centrality) was statistically insignificant irrespective of the task's meaningfulness.

This finding has important implications for those who employ labor in crowdsourcing markets. Companies and intermediaries should develop an understanding of what motivates the people who work on tasks. Employers must think beyond monetary incentives and consider how they can reward workers through non-monetary incentives such as by changing how workers perceive their task. Alienated workers are less likely to do work if they don't know the context of the work they are doing and employers may find they can get more work done for the same wages simply by telling turkers why they are working.

For more details of this study, please see our full academic paper at: http://danachandler.com/index.php/research. We welcome any comments and feedback.

About the authors:
Dana Chandler is a researcher at the University of Chicago’s Becker Center where he works with Steven Levitt, author of Freakonomics. He previously worked as a management consultant at the Boston Consulting Group and at Aureos Capitol, a Colombian private equity company. He will begin his Ph.D. at MIT in the Fall. Dana’s research interests include digital labor markets, development economics, and randomized experiments in companies. email: dchandler {at} uchicago {dot} edu

Adam Kapelner is currently earning his Ph.D. in Statistics at Wharton. Adam is the founder of dictionarysquared.com and the inventor of its vocabulary-learning technology. While working as an undergraduate researcher at Stanford University, he helped engineer the open-source software, www.gemIdent.com, that enables researchers worldwide to locate cells in microscopic images. GemIdent was recently extended to make use of MTurk for outsourcing of medical image identification. The extension, called www.distributeeyes.com, was adapted to serve as the platform for this experiment. email: kapelner {at} wharton {dot} upenn {dot} edu

Acknowledgments: We thank Professor Susan Holmes of Stanford University for allowing us to adopt DistributeEyes (funded under NIH grant #R01GM086884-02) for use in this study. We would also like to thank Panos Ipeirotis for kindly providing us with demographic and market data that we cite in our study. Lawrence Brown, Patrick DeJarnette, John Horton, Emir Kamenica, Steven Levitt, Susanne Neckermann, Jesse Shapiro, Jorg Spenkuch, Jan Stoop, Chad Syverson, Mike Thomas, Abraham Wyner, and seminar participants at the University of Chicago provided especially helpful comments.