Research and Insights

By Nick Gaylord, July 28, 2016

CrowdFlower’s presence at three recent data science conferences

Over the past week and a half, CrowdFlower has had the pleasure of attending three different data events — twice as a conference presenter, and once as an exclusive invitee. It’s always a great opportunity to network with other data professionals and understand the challenges they are tackling. With the release of our CrowdFlower AI offering, it’s especially good to learn more about how other people’s work can benefit from getting into machine learning. Here are some highlights from the events we’ve attended recently.

Data Science Summit (San Francisco)

The Data Science Summit took place over two days at San Francisco’s luxurious Fairmont Hotel. CrowdFlower didn’t just participate in the Data Science Summit, it was also a sponsor. The event consisted of three parallel sessions on topics ranging from natural language processing to analytics for the manufacturing industry.

One of my favorite talks from the summit was the keynote by Carlos Guestrin, one of the founders of Turi as well as professor of computer science at the University of Washington. Two themes in particular stood out from his presentation. The first was the importance of understanding why your machine learning model makes the predictions it does, because sometimes it might be right for the wrong reasons and you want to catch that before it becomes a problem. The second theme was something that resonated through several other talks at the summit: the importance of high-quality data and the human role in the model building process.

This topic was also the focus of a presentation by Lukas Biewald, CrowdFlower’s co-founder, titled “Active learning and humans in the loop.” The phrase “human-in-the-loop” is central to the value CrowdFlower delivers to our customers, and speaks to the many different ways that human intelligence complements machine intelligence to create better, more accurate machine learning and AI solutions. Active learning is a specific example of a human-in-the-loop process, where a machine learning model is able to ask for human clarification on exactly the items it’s not certain how to handle, so it learns faster. Lukas was joined in the same session by Eric Colson from the awesome data science team at StitchFix, who also talked about how a fusion of human and machine intelligence allows them to deliver automatic, customized wardrobe selections to their customers.

Lukas Biewald, CrowdFlower's co-founder, presenting "Active learning and humans in the loop."

Bloomberg Technology Day

Bloomberg is a company that can be described in many different ways. It is a financial company, it is a news company, and it is a technology company. At the core of all of these, though, Bloomberg is in the business of data. To power their industry-leading analytics tools such as Bloomberg Terminal, they need to collect, process, and transmit tremendous amounts of information worldwide, and there may be no company with higher demands of speed and accuracy in doing so.

Bloomberg Technology Day is an event, open only by express invitation, for representatives of other organizations to come learn from some of Bloomberg’s best and brightest about the challenges they are tackling, and the resources they need to do so. The entire event is off the record, so we can’t say any more about what we talked about there, but we will say that we were very glad to be included among other impressive players to get a peek under the hood of the great work the data professionals at Bloomberg are doing.

Data Day Seattle

Immediately after the Bloomberg event, I boarded a plane to Seattle to speak at Data Day, a semi-annual data science conference that is also held in Austin during the winter. It’s a single-day event that manages to pack over 50 hour-long talks into 7 parallel sessions. At about 800 people, it’s also a great size for an event because there are guaranteed to be lots of people you want to talk to, but it isn’t totally overwhelming either.

With all the parallel talks there were some hard choices to make, and I wasn’t able to sit in on everything I wanted to. However, the talks I did choose to attend were excellent. As a special treat, I got to see my friend and former coworker Michelle Casbon deliver one of the keynote talks for the event. We worked together at a former SF startup called Idibon; she now leads data science efforts at Qordoba. Michelle spoke on the topic of “How machine learning is like cycling,” exploring insightful parallels between her work as a senior engineer and her journey to become a serious long-distance cyclist.

Another great talk came from John Akred, CTO of Silicon Valley Data Science. John talked about the intrinsic difficulties in assigning monetary value to organizations’ data. Data is a tricky asset — it’s increasingly essential to so many organizations’ success, but because it’s intangible, a lot of the conventional approaches to determining its value don’t apply. There are other issues, too, like the fact that it depreciates differently and that it isn’t so much a liquid asset as something that can enable other gains to be made (but of course, those projections are also hard). This was a great talk because while it didn’t provide any one-size-fits-all solutions to the challenge of data valuation, it did a great job of exposing the important considerations in that process.

Similarly to Lukas’s talk at the Data Science Summit, I also spoke about active learning. The difference between our presentations really boils down to emphasis: Lukas’s talk painted a broader picture of the value of human-in-the-loop approaches, while mine was a bit of a deeper dive into how active learning works and the benefits it has. In a nutshell, we love active learning here at CrowdFlower because it makes model creation much more efficient by optimizing human involvement, and because it also helps create more balanced training data for the model which is very important from an accuracy standpoint.

One surprise from attending these conferences was that the concepts of active learning and humans-in-the-loop were still quite new to a majority of the attendees. It was a real pleasure to speak with data professionals from a wide range of industries and see the gears turn as they came to realize the value of incorporating human intelligence into the development of their machine learning solutions. As the future of AI develops, the best gains stand to be made via a synthesis of human and machine intelligence, and our conversations over the past few weeks have only served to underscore how many great opportunities there are.