News

By Justin Tenuto, August 17, 2015

Catch up on our sentiment analysis webinar with Oracle

 

Last Thursday, we sat down with a couple great data scientists from Oracle to learn how they use people-powered sentiment analysis. Our CEO Lukas Biewald was joined by Randall Sparks (Principal Member of Technical Staff at Oracle Data Cloud) and Pallika Kanani (Senior Research Staff Member at Oracle Labs) for the session and the folks at Oracle showed us how they create training sets, iterate on their algorithms, and explained how they handle sentiment across multiple languages. We had a lot of questions in the Q&A we couldn’t get to, so we’ll be answering those below. To start, here’s a recording of our chat if you weren’t able to join us:

 

(You can also peruse the slides here)

Alright. Onto your questions:

1. What languages did you run jobs in?

Oracle ran jobs in Spanish and French, among others. CrowdFlower has fluent contributor bases in the following languages:

  • Arabic
  • Bahasa (Indonesian)
  • Chinese
  • English
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Portuguese
  • Russian
  • Spanish
  • Turkish
  • Vietnamese

If you want to get sentiment in a language that’s not listed above, what we recommend that you write your instructions in the language of your choosing and just create test questions. If people don’t understand what you’re asking, they won’t hop in your job and, even if they do, they won’t pass if they can’t read what you need done.

2. How many rows do you need to train a sentiment model?

This was actually asked and answered in the webinar above, but we wanted to call it out here. While there’s no silver bullet, most data scientists get at least 10,000 rows to train a sentiment algorithm. After that, as Randall mentioned in the webinar, things flatten out a bit. You can read about the importance of having a lot of training data in a recent post we wrote and even download some open data in our Data for Everyone library if you want to get a head-start.

3. Can you walk us through the process of setting up a similar job?

Of course we can. While we could go on for ages talking about how to set up a great sentiment analysis job on CrowdFlower, we’ll give you a few tips and a sort of high-level view of best practices. Of course, if you’d like run a job of your own, we have templates for this sort of thing and you can check those out if you take the platform for a spin by signing up for a trial. You can also contact us and we’ll walk you through whatever you need.

1- Choose a template and upload data: You can choose from one of our sentiment templates to save a bit of time constructing your job. Then, just upload a simple .csv.

 

2- Determine what you want to know: Are you looking for positive, negative, and neutral or do you want to know more? Since you can ask follow-up questions on CrowdFlower, you can get more than sentiment. Figure out why people feel great about your product or topic and learn what you can improve. All you’ll need to do is toss in a couple follow-up questions.

 

3- Write instructions and create questions: Give our contributors an idea of what you consider positive and negative, terms to look out for, and provide a couple examples. Then just create a simple form with things like radio buttons or dropdowns for them to select.

 

4- Ensure quality: To make sure only contributors who understand what you want done can actually start tasking on your job, you’ll want to answer a handful of test questions. Contributors will need to meet your accuracy threshold to be let in to your job and, since we seed test questions throughout their workflow, we keep quality high throughout the job. If they fall behind while tasking, we remove their judgments from your job and let someone else answer them.

 

5- Set your settings and launch: We make a recommendation for how much you should pay, but it’s totally up to you. Once you make that determination, just launch your job and monitor it as it goes. You’ll see the work being completed in real time, as thousands of judgments start rolling in.

 

6- Download your enriched data: Since we have a global contributor-base, they work around the clock till your job’s done.

 

4. Is this Oracle’s user interface that has layered CrowdFlower data or are these now new jobs available in CrowdFlower?

The slide in question was actually Oracle’s interface, not ours. We thought it looked nice too. That said, we released a new graphical editor a week or two ago that we think looks pretty slick in its own right. If you want to see a demo, we’ll be showing it off (and our new reports functionality) in another webinar on September 10th. Register here.

5. Do you train models per topic, per language or both?

The more specific a model is (and the more data its learned from), the better. We covered the benefits of both in our post about MonkeyLearn, who used CrowdFlower to create training sets for industry-specific algorithms. Splitting languages is usually best practice as well.

That’s it for now. If you have any additional question that aren’t answered in the webinar or the text above, feel free to ask them in the comments. We’ll answer you there or amend the post above. Thanks for watching.