Research & Insights

By Seth Teicher, September 17, 2014

7 Advances Pushing the Boundaries of Computer Vision

Before we can think critically about computer vision, we need to take a moment to appreciate our own human vision system. Just think what we have been able to do in our lives as humans with eyeballs! We have analyzed, sorted, made sense of arbitrary objects in arbitrary situations, effortlessly tracked the movement of things, recalled perfect 3D models of images, read (written, typed, or even illustrated) letters, numbers, and words, and since we were born, recognized thousands of faces. Our ability to organize and comprehend the masses of information that comes to our eyes each day is nothing short of amazing.

For computers, however, the life behind their camera lenses and other sensors (i.e., artificial intelligence) has been less than amazing and more fraught with confusion. However, this is all now changing for the better. To give perspective, a robot couldn’t even pick up a cup independently as recently as 2006, but by 2011, it was easy, and now we have robots so coordinated that they can juggle 5 balls at a time!

Why the wait? Well, computer vision is no small challenge. In order for it to be useful for all its potential applications, computer scientists have needed to change how they teach their software; in short, they needed to feed it massive quantities of training data. Fortunately, many thanks to the use of data science, machine learning, artificial neural networks, and enriched training data (provided by platforms like CrowdFlower), in the past few years, advances in the field of computer vision have been pouring in.

Here are 7 recent advances that should grab your attention:

1. Mega Knowledge-Graphs for Robots Become a Reality:

Built in-part by humans providing direct feedback online and in-part by machine learning algorithms that are constantly searching the internet, Robo Brain is an online library of information that computer vision researchers can access to give their robots real understanding of the world they see around them. Through structured deep learning, where artificial intelligence learns many layers of meaning behind a thing (such as, not just how a coffee cup looks but what humans do with it), this library will allow robots to make the best choices based on data-fed 3D knowledge graphs. Robo Brain is not the only project in this area, but it’s an impressive insight into the power of machine learning coupled with a little crowdsourcing, as visitors to the website can help make corrections and additions to the library.

2. Project Adam Shows off the Power of Deep Learning for Image Recognition:

You can’t over emphasize the value of using a large quantity and high quality of training data for machine learning. Project Adam from Microsoft makes this abundantly clear with its computer vision program that’s so advanced it can tell the difference between a picture of a Pembroke and a Cardigan Corgi (dog breeds), a task that’s quite difficult even for a person. Modeling itself as a neural network, Project Adam’s algorithms have seen over 14 million images split up into 22,000 categories drawn from the ImageNet database. This is how its machine learning algorithms are so adept at recognizing images even in varied environments.

But dog species identification is just the beginning. Project Adam could hold real promise for the future for both health conscious consumers and the blind and disabled community. Have a watch to see what Microsoft Researcher Trishul Chilimbi sees on the horizon.


3. Showing A.I. How It’s Done and Humans Controlling Robots:

When software and robots won’t or can’t do what you want, it makes sense to create browser-based software to just let humans take the controls for the work at hand. As an added bonus to this, machine learning algorithms can often learn a lot from observing their human counterparts at work. Here are three examples of this worth checking out:

  • Using the crowd to count neurons and train algorithms to detect the edges of cells, CrowdFlower teamed up with Harvard University to help their researchers find a faster way to analyze their visual data on cellular imagery. This is an example where an algorithm just couldn’t cut it, and the crowd needed to step in. (For a similar example where you can get involved as a volunteer see: Eyewire)

  • In 2012, Willow Garage set up the Heaphy Project which offloaded many of its tasks to online workers who were paid to direct PR2 robots through particular chores and tasks. During the work, the robot was able to observe and learn. Watch the entertaining video to get a full sense of what went on in this project.

  • Tell Me Dave is another example. However, the way it learns from online volunteers is through a simulated robot in virtual environments. Like a video game, volunteers can move the virtual robot through a simulated space and teach it how to do things like preparing a room for a party (putting out bowls of chips and dips, etc.) or whether to boil water with a stove or the microwave. Definitely worth trying out if you want to see what it feels like to be the robot’s overlord (for now ;). The project aims to build a crowdsourced library of instructions for the PR2 and other robots so that ultimately they can be trained to flexibly reason, make decisions and understand natural language.


PR2 at the Intelligent Autonomous Systems Group, Technischen Universität München. Photo by Jiuguang Wang

4. Satellite Imagery is a Big Help as Big Data: 

In addition to the famous search for the missing malaysian airlines flight, satellite imagery + crowdsourcing has been used to solve a number of problems, including assessing the damage of floods, and finding indirect ways to map poverty, like counting metal vs. thatched roofs from satellite photos.

5. Armchair Archaeologists Uncover the History of Ancient Athens:

While fedora-sporting Indy remains the archetype for the far-flung archaeologist of our imagination, the field has transformed markedly in recent years.

The new frontier finds our hero hunting for the lost Egyptian city of Tanis from her office in Alabama or conscripting an army of armchair archaeologists to comb through satellite images in search for Genghis Khan’s tomb. In both cases, a combination of expert and programmatic imagery analysis enabled faster, cheaper and more effective exploration.   

In an effort to tackle these sorts of citizen science challenges on an ongoing basis, a group of researchers at Barcelona’s Computer Vision Center have developed a crowdsourcing platform, called Knowxel, to train and supplement algorithms in both text and imagery analysis by allowing their users to tap into a global, mobile workforce.

One such Knowxel task that caught my eye was the analysis of ancient Athenian pottery. In order to determine the origin and time frame of a specific vessel, researchers need to uncover clues. To accomplish this, crowd workers are asked to draw bounding boxes around the helmets of hoplites featured on the vessels. These cropped images are then programmatically analyzed to make determinations about the history of each piece.


Preview of Athenian pottery helmet identificaiton task Knowxel.

It’s fun to imagine what new discoveries the hidden past will reveal as these new techniques become more widely adopted!

6. Image Labeling Succeeds as a Service:

Sometimes, software services are a lot like us, they need to know what they are looking at before they can lend a hand. One such case is in the discovery of new wines to drink or recalling the ones you’ve enjoyed. The Delectable app is tackling this exactly. With a database of millions of wine labels, they use machine learning to help people sort out exactly what wine they are looking at and learn more about it. To train and refine this algorithm, Delectable taps CrowdFlower’s workforce to categorize, match and transcribe bottles when app users take a picture that Delectable doesn’t recognize.

7. Facial Recognize Rapidly Approaching Human Abilities:

We don’t always think of ourselves as carrying around facial recognition software, but we kind of do. In fact, this process takes up a big slice of our brains because it’s an important, not so easy, task. Thanks to training data on the scale of something like people’s photos from Facebook, however, computer vision applications have become almost just as good.

The Future of Computer Vision:

The future of computer vision is bright. However, there is much work, many disoveries and a lot of research left to go. Nevertheless, one points seems to stand out: What many of these breakthroughs above have shown us is that training data, developed by data enrichment platforms such as CrowdFlower or Knowxel, combined with machine learning, is what will catalyze this field’s development and propel us into a much more artificially intelligent future.