Amazon Mechanical Turk Survey

by /

The distributed distributed work meetup was this past Monday, and it would be an injustice to not have a blog post on the workers who make distributed work possible. Over the weekend we decided we would rerun and reexamine Panos Ipeirotis's survey of turkers. Panos, by the way, has a great blog on crowdsourcing, Amazon Mechanical Turk, and other interesting topics. I highly recommend reading it for many Turk related experiments and studies.

We used mostly the same questions as Panos's survey, which asked for:

  • the Turker's age (year of birth)
  • gender
  • educational level
  • income level
  • marital status
  • questions about their engagement on Turk
  • how often they Turk
  • income from Turk
  • why they Turk

In contrast to Panos's original survey which was run over a 3 week period, we ran this survey over the weekend in a 24 hour period. Due to this abbreviated weekend running, there will most likely be greater confounding factors as well as stronger selection bias, i.e. groups who work on weekends as opposed to during the week. To help mitigate the timing of the experiment and provide surveys close to uniformly over the 24 hours, we setup a script to release only 50 surveys an hour. Responses generally came at a steady pace and Turkers available at each hour were represented.

Since we've examined Turker motivation before, albeit hardly rigorously, I will mostly focus on the rise of India as well other location specific considerations in this post.

In comparison to Panos's survey the greatest difference in the results was in the distribution of respondents' location. We found that India made up 46.85% of our respondents, while the US made up 42.7%. In contrast, Panos's survey (run over a 3 week period in February) had 46.8% of respondents from the US, and 34.0% were from India. To determine if this difference was due to self reporting error, we checked self reported location vs geocoded IP location and found that they matched almost exactly. There were 31 differences out of 1016 survey responses, but these differences could potentially be attributed to ambiguity in the question ("Where are you from?"). Ultimately these differences between self report location and our geocoded IP location were not statistically significant. This is an encouraging result, suggesting Turkers are overwhelmingly honest when answering survey questions about where they are from.

Viewing the responses over time suggests that Turkers work during non-sleeping hours. The graph below shows this pattern. If we were to repeat this experiment we'd want to run this task over a week or two instead of over 24 hours. Because of this abbreviated time period the pattern is not as evident towards the end of the job where work rates declined across the board.

To better test the hypothesis that Turkers generally work during non-sleeping hours of their respective countries, below is a graph showing the distribution of work done for CrowdFlower on Mechanical Turk each hour of a day by continent (specifically Asia, Europe, and the "Americas") over the course of 6 months. In this 6 months we collected approximately 9 million judgments. In the graph this means if Asia has a point at 6%, 4 AM GMT then 6% of judgments made in Asia came around 4AM GMT.

For each continent the peaks roughly correspond to daytime while the valleys to nighttime, which is what we'd expect. Asia's peak seems to be more of a plateau, and this is likely due to in part to the number of timezones Asia encompasses. In my post about task localization, we saw (what is intuitively obvious) that workers' locales are an important factor in assessing quality, especially for language specific tasks. On Mechanical Turk, to hit the right workforce for a language specific task, it is advisable to restrict available hits to certain times to limit the number of responses from countries whose native languages are not applicable. We have to use this round about method for Mechanical Turk because you cannot restrict the workforce to a set of multiple countries, either restrict work to one country or restrict work to all but one country.

We've already noted that India represents a sizable and rapidly growing portion of the Turk workforce. We are particularly interested in this rate of growth and the future trends in worker locales. The next graph compares the US to India in terms of monthly volume of judgments completed on our jobs posted to Mechanical Turk in 2010. The location information comes from geocoded IP adresses.

The above shows that on Mechanical Turk we've seen an increase in the proportion of our workers who are Indian since December. This sample was collected over a relatively short period of time, and is definitely something we'll want to monitor in the future. Lastly, I want to emphasize that though this experiment is hardly rigorous and there are many more factors to analyze, Mechanical Turk as well as other vendors of work (Gambit, Samasource, LiveOps, etc.) are continuing to evolve and grow extremely rapidly, and consequently so grows the potential and possibilites for distributed work. Next we'll examine a survey of Gambit, then Samasource.

John