In the last 10 years, there has been a powerful push for governments at all levels to open the datasets they develop to the public. In 2015, the 3rd International Open Data Conference held in Ottawa, Canada showcased surprisingly rapid progress in the development of principles, standards, measurement metrics and road maps for the growth of open data. With broad statements of support and participation by a rapidly growing set of national, state and civic governments, the momentum is putting increasing pressure on all governments organizations to continue to do even more. And they should.
Let's start with the good news: scientists do a ton of research. Just looking at the biomedical field alone, a million papers are published each year. And while that's a staggering amount of knowledge we're accumulating, it brings us to the bad news: nobody really knows what's in all those papers.
It was 2008. I was a bright-eyed 26-year-old that was thrilled to leave a stable paycheck for a life of entrepreneurial uncertainty. I had a blank canvas in front of me and a few massive decisions about how to build the CrowdFlower tech stack. Oh, but how little did I understand the impact those initial choices would have on the CrowdFlower of 2015.
Last Thursday, we sat down with a couple great data scientists from Oracle to learn how they use people-powered sentiment analysis. Our CEO Lukas Biewald was joined by Randall Sparks (Principal Member of Technical Staff at Oracle Data Cloud) and Pallika Kanani (Senior Research Staff Member at Oracle Labs) for the session and the folks at Oracle showed us how they create training sets, iterate on their algorithms, and explained how they handle sentiment across multiple languages. We had a lot of questions in the Q&A we couldn't get to, so we'll be answering those below. To start, here's a recording of our chat if you weren't able to join us:
One of the big reasons we created our Data for Everyone initiative is that there simply aren't a ton of great open datasets out there for small businesses, startups, and academics to do work on. Sure, there are plenty of small, toy-sized datasets but those simply aren't big enough to create algorithms that anyone can trust. In fact, our founder Lukas wrote as much in his post on Computer World: