By Tatiana Josephy, September 29, 2014

CrowdFlower Now Offering Twelve Language Crowds

This past year, our customers have used CrowdFlower to transform big data into rich data in dozens of languages. To date, we’ve offered users a combination of geographic targeting alongside a select group of Language Crowds to accomplish this. Today, we are thrilled to announce eight new Language Crowds in addition to revamping our existing French, German, Portuguese and Spanish crowds.

Why do Language Crowds make a Difference?

Many businesses need to enrich data that require foreign language proficiency. Popular examples include tuning search relevance on ecommerce websites, sentiment analysis of social content, and text moderation. Performing this work in-house is not only painful and time-consuming but may become impossible without access to large volumes of people with the necessary language skills. Additionally, natural language processing technology still doesn’t cut it, especially in languages other than English. And even if you were to use NLP, you’d still require training data. That’s where CrowdFlower shines and a big reason why we’ve invested in broadening the scope of our platform’s language capabilities.

With this release, we have strengthened and expanded the four existing Language Crowds and are enabling users to target eight new languages: Hindi, Arabic, Indonesian, Turkish, Italian, Russian, Vietnamese, and Chinese.

Hear our contributors say “Hi” in their native language.

The sizes of new Language Crowds are shown in the figure below in percentage points relative to the number of our Level 3 contributors. All calculations are based on numbers of contributors active in the past thirty days.

As you can see, while Spanish is by far our largest Language Crowd, over time, and with increased customer demand, you can expect the others to continue to grow.

As mentioned earlier, while targeting work based on geography within the platform is one tactic to reach foreign language contributors, the map below demonstrates where contributors in each of the Language Crowds come from. Note that most of our Contributors who speak Chinese come from Singapore and Hong Kong, not visible on the map with current resolution.

We produced this generation of Language Crowds in collaboration with CrowdFlower’s Data Science team. Contributors were carefully selected for these new Language Crowds based on a model that takes into account the candidates’ geolocation, browser data, tasking habits, and performance history that CrowdFlower tracks for every contributor. Language Crowds identified using this method have out-performed every alternative, including crowds generated via our in-house language proficiency tests and those recommended by external providers specializing in online language skill assessment.


Once contributors make it into their respective Language Crowd, we monitor their behavior for signs suggesting poor performance or lack of language skills, and remove the bad seeds promptly. New contributors meeting entry criteria for Language Crowds will be added on a regular basis once they complete enough tasks to allow accurate estimation of their performance. We are especially excited about growing our brand new Chinese crowd over the next few months.

CrowdFlower clients are able to target tasks to these new Language Crowds via the “Skills” settings in the platform today. Other languages, including Korean and Japanese, are available in consultation with your Customer Success team.