What kinds of questions produce the best results in crowdsourcing tasks and surveys? To answer that question, I bring you another geeked-out blog post in which I pit the multiple choice (or forced choice) question against its bitter arch-rival, the check-all-that-apply (or checkbox) question.Both kinds of formatting can be useful when you want people to identify or categorize something(s) in a list. Check-all-that-apply seems to offer the added bonus of easily fitting an entire list into a single question, and thereby requiring less mental effort from respondents and (presumably) reducing response times. How do the two kinds of questions compare on answer quality, though?In my first post a few weeks ago, I talked about some of the reasons why response scales matter when you’re designing multiple choice questions for a survey or data collection task. In the comments, michael raised an interesting point:
Why use a scale at all? I would make those types of questions always open ended. Anyone who takes the survey has to think about how many hours they spend online anyway. That’s the first step. The second is fitting their estimate in one of the categories. Seems like unnecessary work for the participants.
You can check out the rest of the thread to see michael’s idea in context as well as how other people (including me) replied.The discussion got me thinking more about multiple choice questions and some of the costs and benefits that they entail in comparison to other types of questions. As luck would have it, a few of the other questions that I included in my original experiment can provide additional grist for the mill.
Question Format Smackdown!
In order to test how each format affects responses, I asked workers in the Crowdlabor pools one (and only one) version of the following:
As you can see, the forced choice version is a little clunky because I had to separate each item as a separate question. Nevertheless, there’s no substantive difference between the two versions other than the answer choice format, which makes it possible to compare the results.I should explain why I included several extremely popular websites (Google) among the answer options as well as some slightly less well-traveled, but still popular sites (Times of India, New York Times). Basically, this was in order to avoid too many people having visited all the sites or none of the sites. If a lot of responses fell into either extreme, it would have been impossible to estimate the extent to which the two formats affected the outcomes.As with my response scale example, the groups that saw the two versions of the question did not vary widely on potentially confounding demographic covariates such as gender or country of residence.
Here’s a table showing the number of positive responses per format per site:
And a plot to visualize the variations per site as a percentage of total responses per question format:
With one exception (Orkut), forced choice formatting resulted in more people saying they had visited every single site in the list.
Estimating the Effect
In order to get a precise measurement of the effect of forced choice format vs. checkbox format, I reshape the data into cumulative counts and compare the distributions of total number of sites visited among people who saw the checkbox and forced choice versions respectively. Here’s the resulting table:
A pair of density plots represents the same information in graphical form:
On each plot, I’ve highlighted the minimum, maximum, and mean (sparklines-style). The heavy left-leaning skew of the checkbox curve contrasts nicely with slightly right-leaning shape of the forced choice curve.From both the table and the density plots, it’s easy to see that the two question formats appear to have caused a substantial difference. The difference in means between the two distributions suggests that a respondent who saw the forced or multiple choice format identified (on average) one additional site they had visited which their peers who saw checkboxes did not identify.
Should I hate Checkboxes?
The demographic profile of the two groups was pretty similar, so the disparity in the results was almost certainly due to the question format. But why does the question format have such a powerful effect?Given the opportunity, checkbox respondents either failed to notice or ignored answer choices when they were not forced to provide a response to each one. As with numerical response scales, this is yet another example of how mental shortcuts can compromise data quality.This time around, a solution is pretty simple. All things being equal, you’re better off using forced choice formatting when you care about precise results. That said, things are never really equal and there will always be some reason you might want to consider making life faster/simpler for the people answering the questions. For example, if you’re asking people to choose tags or labels for something, the precision of each response might not matter very much and checkboxes would work just fine.
I used R for all the analysis and plots. I created the first plot using Hadley Wickham’ ggplot2 package. Contact me with requests for data or code at aaron [at] doloreslabs [dot] com and leave your questions, complaints, or suggestions below.