Can the quality of crowdsourced work be linked to geography?
In short, the answer is no. Here’s why.
Our team looked at whether including a workforce from a specific region helped or hurt the completion rate and quality of a recent website categorization project. Overall, the project showed a common trend, where the quality of work improved over time, regardless of geography.
The figure above shows the overall volume of judgments (blue) plotted against trusted judgments (green). In addition to the large amount of untrusted work over the first two days, this graph shows spikes in volume that correspond with time of day. Two possible explanations for the change in quality are:
- Only good workers were able to continue working after the first few days.
- A flood of bad work from specific geographies, related to the spikes in throughput at certain hours.
“Just a Taste” Workers
One of the ways that CrowdFlower maintains quality is by incorporating questions with known answers, and tracking worker performance on these units as a proxy for overall accuracy. If workers don’t maintain a minimum level of accuracy, they are prohibited from continuing work on any given task.
In this job, it appears that many workers were unable to meet our quality standards. This was a somewhat subtle categorization job, characterizing the content of websites according to handful of criteria, so it’s not surprising that many workers had trouble.
As a result, many workers did a relatively small amount of work, but they could not continue because they didn’t meet our quality standards. The figure below shows the number of workers completing a specified number of judgments.
This graph demonstrates a known behavior in online tasks, where many workers attempt only a small amount of work before abandoning a given task. Our quality-control mechanism requires workers to demonstrate accuracy before it accepts their work.
While it is interesting that our quality control identified certain high accuracy workers and allowed them to continue, this doesn’t answer the question of why there were such dramatic changes in throughput during the day.
Especially over the first two days, the relative amount of untrusted work was much greater between 10 p.m. and 10 a.m. (Pacific Time), which is when we see a preponderance of work coming from workers outside of the United States. Indeed, after filtering workers by IP address, we saw that 73 percent of all workers in this job came from India. On the other hand, the region’s workers accounted for only 46 percent of trusted workers.
However, while these workers account for a relatively small proportion of trusted workers, they did much better in terms of trusted work submitted.
This workforce accounts for fully two-thirds of the total trusted work on this job. While that is somewhat less than their overall representation in the labor pool, this suggests that we can’t dismiss these workers as low-quality.
Pareto Me This
All of this raises a very interesting observation. As you may have seen elsewhere1, a relatively small minority of people often account for the vast majority of observed effects. This is common in crowdsourcing, just as it was in terms of land ownership in 19th century Italy.
In this example, the top three percent of most prolific workers provided over 40 percent of trusted work. The top 20 percent of workers provided over 80 percent of trusted work.
Keep the Bums Out
Focusing on the top 20 percent of most prolific workers, we see that one-half came from India while another one-third came from the U.S. Other countries provided the remaining workers. While it is true that the vast majority of untrusted workers in this job, who collectively provided relatively few judgments, came from India, it is also true that the country’s workers make up half of the most prolific workers and provided two-thirds of all trusted work.
The solution to improving the efficiency of this job, then, is not the crude choice of excluding workers from certain geographies. Rather, we can discourage bad workers by increasing the burden of entry, so that only workers with an interest in completing more than a few judgments will bother with the job.
1. Ferris, Tim (2006), The 4-Hour Workweek, Crown Publishing.