CrowdFlower encourages developers and researchers to use its open data to explore new ways of what crowdsourcing can achieve. This webpage is a repository of data sets collected or enhanced by CrowdFlower's workforce and made available for everyone to use.

Dataset of 4000 crowd-named colors in 9 languages. Includes the RGB color, the native language color, and the translated color
Added: November 13, 2013 by Dave Oleson | Data Rows: 4,000
This dataset is a collection of tweets related to nuclear energy along with the crowd's evaluation of the tweet's sentiment. The possible sentiment categories are: "Positive", "Negative", "Neutral / author is just sharing information", "Tweet NOT related to nuclear energy", and "I can't tell". We also provide an estimation of the crowds' confidence that each category is correct which can be used to identify tweets whose sentiment may be unclear.
Added: August 30, 2013 by CrowdFlower | Data Rows: 190
This dataset is a collection of English sentence pairs. The crowd was asked about the truth value of the second sentence if the first sentence were true and to what extent the sentences are related on a scale of 1 to 5. The variance of this score over the crowd's judgments is included as well.
Added: August 30, 2013 by Marco Baroni | Data Rows: 555
These are the jobs instructions, which do a good job of describing the dataset:
"Contributors decide whether the word correctly describes some aspect of the image. The word could refer to an object depicted in the image ("apple"), or to an attribute such as color ("red"), shape ("round"), age ("old"), size ("small"), material ("metal"), or even to more abstract properties ("sad"), as long as they are clearly depicted in the image. Actions or postures depicted in the picture ("game", "cooking", "sitting") are also OK. Only English words have to be considered, all other words should be assigned a "no". These include foreign words ("retrato"), nonsense sequences of characters ("scyhbqwdoi") and words that should be written with spaces in between ("redcar", "newyorkcity"). Please also exclude names of people, locations, etc. ("Stockholm", "Spain", "Skinny"), and reject labels that describe or comment on the image as such, as opposed to the things it depicts (for example, "polaroid", "macro", "selfportrait", "portrait" unless the picture depicts a portrait! or even "beautiful" if it refers to the picture instead of the thing depicted).""

Contributors were asked to evaluate how similar are two sets of words on a seven point scale with 1 being "completely different" and 7 being "exactly the same". The pairs are
Added: August 30, 2013 by Marco Baroni | Data Rows: 6,274
This job has the contributor judge how a tweet is related to yogurt using these possible options:
1) Eating-mentions what they're eating or what they like (dislike) eating.
2) Health-mentions health aspects of yogurt.
3) Part of a story-"Little Johnny has yogurt all over his face. How cute!"
4) Deal-such as a coupon or special pricing.
5) Cause-such as something for breast cancer or veterans.
6) Advertising-mentions an ad about yogurt.
7) Slang - "yogurt" used sometimes as slang. Check out Urban Dictionary if you want more on this.
8) I can't tell-please pick this if you think the tweet might fall into a category, however, you just aren't sure. It may be that you would need to see the linked story, or a tweet that this one responds to, etc.
Then a checkbox of possible options if a special style of yogurt is mentioned:
frozen
smoothie
snack bar
none

Contributors evaluated tweets for belief in the existence of global warming or climate change. The possible answers were "Yes" if the tweet suggests global warming is occuring, "No" if the tweet suggests global warming is not occuring, and "I can't tell" if the wtweet is ambiguous or unrelated to global warming. We also provide a confidence score for the classification of each tweet.
Added: August 30, 2013 by Kent Cavender-Bares | Data Rows: 6,090
Contributors evaluated tweets about multiple brands and products. The crowd was asked if the tweet expressed positive, negative or no emotion towards a brand and/or product. If some emotion was expressed they were also asked to say which brand or product was the target of that emotion.
Added: August 30, 2013 by Kent Cavender-Bares | Data Rows: 9,093
This dataset has all tweets that mention Claritin for October, 2012. The tweets are tagged with sentiment, the author's gender, and whether or not they mention any of the top 10 adverse events reported to the FDA. You can see a visualization of the full dataset here: https://senti.crowdflower.com/datasets/857/t/91f9ed1ab4281adf. For a fuller description of the dataset, see here: https://crowdflower.com/blog/2013/03/discovering-drug-side-effects-with-crowdsourcing/.
Added: November 13, 2013 by Dave Oleson | Data Rows: 4,900


