One of the most exciting benefits of working at CrowdFlower is the daily exposure to some of the most groundbreaking and fascinating business ideas. This blog post is a Q&A to provide you with an understanding of one of the more common use cases for crowdsourcing: sentiment analysis (or sentiment “coding”).
Recently, I spoke with Tom Sanders of Downdetector, one of the leading innovators in service outage detecting technology. Downdetector identifies outages in company provided services using a variety of different methods. One of those methods is to monitor trends in social media. Below are the five questions I asked Tom to gain more insight into their experience with the CrowdFlower platform.
Q1. Tell us about your business. What do you do?
We started Downdetector because we felt that there was a big need for instant information on outages such as outage for online services or telecommunications providers. We noticed that during outages people increasingly turn to social media to complain, shame the provider into acting, and to get confirmation from their friends about the scope of the outage. Early 2012 we set out to create a service that would use those tweets and Facebook comments to automatically determine if a certain service was suffering from an outage. We use several data sources to collect complaints and display these in a chart on our website. The chart peaks during outages, which allows our site visitors to determine if there is an outage or not.
Around April 2012 we launched our service in The Netherlands, where the company is headquartered and it turned into an almost immediate success. In January 2013 we started expanding into other countries and we currently track outages in 14 countries, including the United States, UK, India, Brazil, Russia, Germany, Spain and Mexico.
Q2. What was the problem that led you to look into crowdsourcing?
It is actually a tough task to determine if a tweet is a complaint about an outage or not. When we started out we used basic filtering and would manually remove tweets that didn’t qualify as outage tweets. That isn’t just a boring job, but it would also take a long time to check a large number of tweets and it was prone to errors. We soon realized that a crowd labor platform would be a perfect solution for this.
We now feed tweets into CrowdFlower and ask workers to determine if they consider a tweet to be a complaint about a current outage or not.
Q3. How has your experience been with CrowdFlower Platform? What have you learned?
It was surprisingly easy to set up the basis task with CrowdFlower. The interface is so intuitive that anyone who can use a content management system will be able to set up a task like ours in CrowdFlower.
The tough part is quality control. We needed to educate both the workers and ourselves.
Having seen (and qualified) tens of thousands of tweets ourselves, we have a pretty good idea when a tweet should be considered a ‘problem tweet’ and when not. But now we had to explain this to the workers. To our surprise, we got poor results if we asked “Is this tweet about an outage or not?”
We had to go back to square one: we took a set of tweets that we had already judged, and put it in front of the crowd-workers. If they would judge a tweet differently from us, we’d try to understand what caused this. We used this information in the instructions. In the end the instructions weren’t as much about what we were trying to accomplish, but about what the workers found hard about the task.
The second part was educating the workers: a small portion of workers wouldn’t provide answers that met our quality criteria. Perhaps they didn’t understand the instructions, or didn’t bother to read or understand them. The CrowdFlower Gold questions help us to deal with that problem: they system already knows the answer to a gold question. Workers who fail too many gold questions are ‘fired’.
We got rid of all the obvious gold questions (the ones that everybody got right), instead turning the hard questions into gold.
Finally, we needed to figure out the best way to structure the questions. Should we have workers judge 1, 5, 10 or 20 tweets at a time? Should we have 3, 4 or 5 people judge each tweet? We simply set up lots of tests and determined the cost and quality for each.
Q4. What’s been the result of using CrowdFlower? In terms of accomplishing something you couldn’t do before, saving time or saving money?
We are now able to judge tweets in any language, without having to speak or understand the language. The alternative would have been to hire a freelance worker to do this job. Using CrowdFlower we pay only 10% of what it would cost to hire freelance workers to do the same job.
And with freelancer I wouldn’t be comfortable about the quality: it gets really boring if you have to sit for hours and judge tweets. Quality therefore is bound to suffer.
Q5. What advice would you give to others who are thinking about looking into crowdsourcing?
Start small, take your time and document progress. Our biggest challenge was making sure that we asked the right question with the right instructions. If you run lots of different tests, you get results soon, allowing you to tweak your settings and see results. For our first test got a margin of error of about 20%, a dozen of test later we got it down to 1.5% at a very good price. We’re really happy with those results.
End of Q&A
I would like to thank Tom Sanders and his company for participating in our Q&A. If you would like to know more about the services Downdetector provides, please visit Downdetector.com.