Research & Insights

By Sarit Szpiro, September 30, 2014

Cognitive Difficulty Predicts Task Performance

As a Psychology PhD student, I’m fascinated by how humans think. I’m constantly looking for new ways to understand their behavior. That’s a large part of the reason why I decided to intern at CrowdFlower. These past few months, I had the chance to explore big questions about human cognition with the world’s largest workforce. Notably, what makes a task difficult? Why would the same worker excel at one task, but perform poorly on another? Can we learn something about human cognition from crowdsourcing and data enrichment?

human_brain_jpeg

image via cblue98 on Flickr Creative Commons

Classifying CrowdFlower Tasks Based on Human Traits:

One famous classification in psychology is the big five personality traits (openness, conscientiousness, extraversion, agreeableness, and neuroticism). The application of this classifcation as applied to big data can be seen in the work of Five Labs, whose personality mapping project recently went viral with the release of an app that uses NLP to conduct analysis of users’ Facebook status updates and profiles.

However, for our purposes it didn’t make sense to use personality traits to classify data enrichment tasks — ie., it isn’t clear how “openness” would impact how a contributor would approach an image categorization task. Instead, I developed a new set of features to represent practical cognitive skills that applied to data enrichment tasks on CrowdFlower.

For example, some tasks may require a lot of attention to detail (like noticing if addresses are identical), while others require noticing nuances in the sentiment of tweets. To try and identify which cognitive features to use as factors for my research, I analyzed a lot of jobs posted on the CrowdFlower platform. I came up with a list of eight cognitive skills that can be used for further analysis:

  • Short-Term-Memory: Tasks that rely on remembering details, like tasks with long guidelines.
  • Attention: Tasks that require patience and attention to detail, like reading a product description.
  • Sentiment: Tasks that require noticing emotion within text or an image.
  • Training: Tasks that require a lot of training to do well (usually they pay better too!).
  • Visual Complexity: Tasks that require making decisions based on an image’s visual composition, like deciding whether it is beautiful.
  • Verbal Complexity: Tasks that require advanced knowledge of English.
  • Opinion: Tasks that require the worker’s perspective or making a personal judgment.
  • Categorization: Tasks that require categorization from a list of choices. Some tasks have a large number of categories (even requiring search within a category list); while others have only four options to chose from.

Some of these features relate to each other. For example sometimes determining the sentiment also means making a personal judgment (opinion); and some attention tasks require memory as well. But that’s a good thing –  it allows a more complete description of tasks through these cognitive skills.  

Rating Tasks Based on Cognitive Features:

The next step was to rate a variety of tasks on each of these cognitive features. Every task is then represented by eight numbers (between zero and one) that describe how much of each cognitive skill is required for that task.

Rating tasks according to cognitive skills has two potential uses – it creates a metric of similarity between tasks, and it can be used to predict a task’s accuracy. I used the cognitive rating to try to predict the accuracy (using random forests).

Here are the results:

cognitive_model_for_predicting_job_accuracy

The x-axis is the actual accuracy on a task, and the y-axis is the predicted accuracy. The line represents equality (when the prediction matches the true values) – the closer the points are to the line, the better the prediction. And they are very close! Success!

These results are really good (R2 = .58!). This means that the cognitive features can predict the accuracy on a task.

These results are promising and can serve as useful guide to future task design and the allocation of work to specific cohorts of contributors For example, tasks that are predicted to be 65% accurate, should probably be re-designed (maybe even split to several tasks) in order to get higher quality results. While my internship is only temporary here at CrowdFlower, the data science team will continue to explore how information about contributors’ cognitive abilities can be used to enhance the CrowdFlower platform

As I continue to explore this concept, I plan to publish a follow up post on predicting future contributor accuracy based on their historical accuracy and their personal cognitive abilities. Keep an eye out! If you’d like to chat further, send me a message on Twitter at @SaritSzpiro.

Side note: Rating tasks myself is obviously a bottleneck, as I cannot rate the thousands of tasks that run on CrowdFlower. Luckily, we can crowdsource task ratings (!).  I designed a “job of jobs” – a task that asks workers to rate other tasks according to cognitive skills. This work is still in progress, hopefully we have more updates soon.