Research & Insights

By Greg Laughlin, July 28, 2011

CrowdFlower Challenges Yelp: It’s a Nerd-Off


Dramatic Intro

It is high noon in business listing verification crowdsourcing land. We are throwing down the gauntlet. We are stepping in the ring. We are mixing our metaphors.

Undramatic Intro

Yelp engineers recently described their efforts to correct business listing data using Amazon Turk. They tapped the services of 4,660 contributors; only 79 passed their quality assurance testing (1.7% of contributors were “trusted”), and the data they output was (very roughly) 80% accurate.

This smelled funny to us. Our business listing verification service routinely returns results above 97% accuracy. In fact, some of the most recognizable names in local search and business data pay for that service. (See a full report on 100,000 listings we did for a major search company to see some typical figures). Out of the last couple dozen crowdsourcing tasks we’ve run, the absolute minimum proportion of contributors who were “trusted” was 34%. But more importantly, our platform identifies these trusted contributors within minutes, meaning the best contributors get the job done quickly.

crowdsourcing URL Precision Numbers

So. Why Did Yelp’s Project Struggle to Meet Enterprise Accuracy Standards?

It was not because of a lack of brains. The bios of the eight folks at Yelp who worked on this project are smattered with words like “Harvey Mudd” and “Computer Science” and “Stanford” and “PhD”. And they work at Yelp, which is, y’know, awesome.

And it was not because of a lack of good contributors. CrowdFlower has first-hand experience with well over one million contributors (many from the Mechanical Turk platform), and, when given the right tools and feedback, we’ve found them to be very accurate.

No, Really. Why?

It was in part because the Yelp team did not have the tools developed over the years by CrowdFlower’s crack engineering team.

  • Our contributors face ongoing tests as they complete work, and whenever they get an answer wrong they are given feedback as to why they were wrong in real-time; these tests are carefully calibrated to test for the most common types of contributor errors.
  • Our contributor UIs are the products of dozens of A|B tests run through CrowdFlower’s custom A|B testing infrastructure.
  • We use digital assembly line technology, chaining together many very simple tasks to yield a complex result. The below is a (somewhat outdated) representation of our business listing verification assembly line, where each blue box represents one discrete user task:
crowdsourcing Digital Assembly Line

Just as important… this project probably struggled to succeed because crowdsourcing to enterprise standard quality is incredibly hard! The business listing verification team at CrowdFlower only succeeded after a full year and tens of millions of human judgments. We worked quite a few 24-hour days, and our social skills atrophied from lack of use.

In the end, we have achieved a solution that is fast, accurate, and affordable to use – and continues to be improved upon.

We did all this work so others won’t have to. If you’re contemplating going it alone, give us a call! It’s not worth it! So many people love you!

The Challenge

On behalf of our contributors, CrowdFlower, and the business listing verification team, I’d like to offer a challenge to you, friendly neighborhood Yelp engineers.

Give us 5,000 business listings. If we can raise the precision of those listings to 95%+ and beat any machine learning algorithms you can build, you give us two engineers. No, just kidding. If we can do so, you’ll write about the experience on your engineering blog.

If we lose (actually, regardless of whether we win or lose), we’ll happily sit with you to show and tell you everything we’ve ever learned about business listing verification.

For more information and actual sample data from another Business Listing Verification project, check out this customer report.