Evidence in experimental psychology suggests that most people overestimate their own ability to complete objective tasks accurately. This phenomenon, often called confidence bias, refers to “a systematic error of judgment made by individuals when they assess the correctness of their responses to questions related to intellectual or perceptual problems.” 1 But does this hold up in crowdsourcing?
We ran an experiment to test for a persistent difference between people’s perceptions of their own accuracy and their actual objective accuracy. We used a set of standardized questions, focusing on the Verbal and Math sections of a common standardized test. For the 829 individuals who answered more than 10 of these questions, we asked for the correct answer as well as an indication of how confident they were of the answer they supplied.
We didn’t use any Gold in this experiment. Instead, we incentivized performance by rewarding those finishing in the top 10%, based on objective accuracy.
Does Bias Exist?
To estimate confidence bias, we looked at the difference between the average of how confident an individual was of his/her answers and how many he/she answered correctly. If the difference is positive, the individual overestimated how well they did. Amazingly, over 75% of contributors overestimated their ability to answer multiple choice questions correctly.
Are Individuals Consistently Biased?
Because our dataset consisted of Math and Verbal questions, we looked at each individual contributor’s confidence bias for both types of questions. In aggregate, people tended to have more trouble with the Verbal questions (average accuracy of 28%, compared to 41% for Math), though the average confidence score was nearly identical (63% +/-1).
The vast majority of contributors fall into the “overconfident on both” quadrant (top right), while only a handful of contributors were overconfident for one question type and underconfident for the other (top left and bottom right quadrants). Overall, there is certainly a correlation between bias scores on the two problem types, suggesting that many individuals are consistently biased on different types of problems. However, this explains only a portion of the variation.
Does Bias Vary Across Groups?
Given that overconfidence seems to be a consistent trait, we were curious how this trait varies across the different groups making up our contributor pool. We sliced and diced our contributors into a number of different sub-groups, which are summarized below.
There are a lot of interesting things going on here. To highlight a few, accuracy increases consistently as the contributor’s education level advances from High School to College, but so does confidence, leaving the bias score nearly unchanged. There’s a similar pattern with Age, with older contributors tending to be both more accurate and more confident.
Gender and Location also have an effect on confidence bias. Taking the two countries that supplied the most people, contributors from the US were much more accurate and slightly more confident than the average, while those from India were average in terms of accuracy but much more confident. As such, the bias score for contributors from India is nearly double that of contributors from the US. With respect to gender, confidence didn’t vary much, but women were more accurate and thus less biased than men. Moving on.
In the context of experimentation, we decided against using Gold to minimize any selection bias among contributors. However, this makes it difficult to apply these results to enterprise crowdsourcing, at least as practiced by CrowdFlower. In the future, it would be interesting to look at confidence bias among trusted workers only, and particularly among trusted workers with repeated experience in specific job types. We would expect these workers to have a better sense of whether their answers are correct, though it is possible (and perhaps likely) that confidence would increase along with accuracy.
1. Pallier, G., Wilkinson, R., Danthir, V., Kleitman, S., Knezevic, G., Stankov, L., & Roberts, R. D. (2002). The role of individual differences in the accuracy of conﬁdence judgments. Journal of General Psychology, 129,257–299