CrowdFlower encourages anyone interested in data to use its open data to enhance their research, uncover new insights, and unlock the power of rich data. This site is a repository of some of the data sets collected or enhanced by CrowdFlower's 5 million contributors and made available for anyone to use.

A collection of images of people eating fruits and cakes and other foodstuffs. Contributors classified the images by male/female, then by age (adult or child/teenager).
Added: March 9, 2015 by CrowdFlower | Data Rows: 587
An interesting language data set about the relationship of broad concepts.
All questions were phrased in the following way: "All [x] are [y]." For example, a contributor would see something like "All Toyotas are vehicles" and were then asked to say whether this claim was true or false. Contributors were also provided images, in case they were unclear as to what either concept is.
This data set includes links to both images provided, the names given for [x] and [y], and whether the statement that "All [x] are [y]" was true or false.
Added: December 22, 2014 by CrowdFlower | Data Rows: 3,536
On February 27th, 2015, the internet was briefly obsessed with the color of a dress known simply as #TheDress. We ran a survey job to 1000 contributors and asked them what colors the dress was, as well as looked into a hypothesis that Night Owls and Morning People saw the dress differently. We wrote about it here.
Added: February 27, 2015 by CrowdFlower | Data Rows: 1,000
A large data set where contributors were asked to look through a variety of political speeches, manifestos, and other texts for references to immigration or immigrant integration.
Added: February 23, 2015 by CrowdFlower | Data Rows: 31,000
A sentiment analysis of negative McDonald's reviews. Contributors were given reviews culled from low-rated McDonald's from random metro areas and asked to classify why the locations received low reviews. Options given were:

A year-by-year breakdown of the cover images to Time Magazine. Referenced in this blog post.
Contributors were shown images of Time Magazine covers since the late 1920s and asked to classify if the person was male or female. Data is broken down overall and on an annual basis.
Added: December 01, 2014 by CrowdFlower | Data Rows: under 100
Contributors were asked to read two sentences (the first was an image caption and the second was a shorter version) and judge whether the short sentence adequately describes the event in the first sentence (image caption).
An example might be something like:
Caption: A man with earphones working in a San Francisco cafe while drinking coffee.
Proposed sentence: Man listening to music in coffee shop
These sentences would qualify as pairs.
Data set includes sentence pairs, contributor agreement scores, and yes/no/unknown rankings.
Added: February 23, 2014 by CrowdFlower | Data Rows: 2,000
A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").
Added: February 12, 2015 by CrowdFlower | Data Rows: 16,000
A large data set of labeled biomedical images, ranging from x-ray and ultrasound to charts, graphs, and even hand-drawn sketches.
The major image categories are as follows:
Data set includes the image class, if a given categorization was accurate, and a URL to the image judged
Added: January 22, 2015 by CrowdFlower | Data Rows: 10,652
A data set concerning the race, religion, age, and other demographic details of all Oscars winners since 1928 in the following categories:
For further information on this data set, please read our resulting blog post.
Added: February 15, 2014 by CrowdFlower | Data Rows: 416
A large data set where contributors were asked to look through a variety of political speeches, manifestos, and other texts for references to immigration or immigrant integration.
Added: February 24, 2015 by CrowdFlower | Data Rows: 10,545

A data categorization job concerning what corporations actually talk about on social media. Contributors were asked to classify statements as information (objective statements about the company or it's activities), dialog (replies to users, etc.), or action (messages that ask for votes or ask users to click on links, etc.).
Added: February 14, 2015 by CrowdFlower | Data Rows: 3,118
A data set where contributors classified if certain body parts were part of other parts. Questions were phrased like so: "[Part 1] is a part of [part 2]," or, by way of example, "Nose is a part of spine" or "Ear is a part of head."
Added: February 4, 2015 by CrowdFlower | Data Rows: 1,892
A data set containing information on hundreds of wearables. Contains data on prices, company name and location, URLs for all wearables, as well as the location of the body on which the wearable is worn.
Added: January 31, 2015 by CrowdFlower | Data Rows: 582
Contributors were shown a large variety of images and asked whether a given word described the image shown. For example, they might see a picture of Mickey Mouse and the word Disneyland, where they'd mark "yes." Conversely, if Mickey Mouse's pair word was "oatmeal," they would mark no.
Data set includes image URLs, the matched word, whether the pair matched, and a confidence score for each.
Added: March 30, 2011 by CrowdFlower | Data Rows: 225,000
Contributors read strange sentences and ranked them on a scale of "implausible" (1) to "plausible" (5). Sentences were phrased in the following manner: "This is not an [x], it is a [y]."
Added: January 23, 2014 by CrowdFlower | Data Rows: 400
A sentiment analysis job about the lineup of Coachella 2015. We wrote about it here. An additional, thousand-row data set about which artists fans were most excited about can be found here. The button to the right concerns sentiment about the festival overall.
Added: February 4, 2015 by CrowdFlower | Data Rows: 3,847
A look into the sentiment around Apple, based on tweets containing #AAPL, @apple, etc.
Contributors were given a tweet and asked whether the user was positive, negative, or neutral about Apple. (They were also allowed to mark "the tweet is not about the company Apple, Inc.)
Tweets cover a wide array of topics including stock performance, new products, IP lawsuits, customer service at Apple stores, etc.
Added: December 28, 2014 by CrowdFlower | Data Rows: 3,969
Here, contributors were asked to rate image quality (as opposed to how pretty the people in the images actually are). They were given a five-point scale, from "unacceptable" (blurry, red-eyed images) to "exceptional" (hi-res, professional-quality portraiture) and ranked a series of images based on that criteria.
Data set includes a URL for each image, an averaged score (of 1-5) for image quality, and a variance rating accounting for subjective, contributor disagreements.
Added: October 03, 2014 by CrowdFlower | Data Rows: 3,500
Here, contributors were asked to rate image quality (as opposed to how gorgeous the buildings in the images actually are). They were given a five-point scale, from "unacceptable" (out-of-focus cityscapes) to "exceptional" (hi-res photos that might appear in a city guide book) and ranked a series of images based on that criteria.
Data set includes a URL for each image, an averaged score (of 1-5) for image quality, and a variance rating accounting for subjective, contributor disagreements.
Added: October 13, 2014 by CrowdFlower | Data Rows: 3,500
Here, contributors were asked to rate image quality (as opposed to how adorable the animals in the images actually are). They were given a five-point scale, from "unacceptable" (blurry photos of pets) to "exceptional" (hi-res photos that might appear in text books or magazines) and ranked a series of images based on that criteria.
Data set includes a URL for each image, an averaged score (of 1-5) for image quality, and a variance rating accounting for subjective, contributor disagreements.
Added: October 15, 2014 by CrowdFlower | Data Rows: 3,500
A linguistical data set concerning the certainly an author has about a certain word. For example, in the following sentence: "The dog ran out the door," if the word "ran" was asked about, the certainty that the event did or will happen would be high.
Added: February 06, 2015 by CrowdFlower | Data Rows: 13,386
Before the 2015 Super Bowl, there was a great deal of chatter around deflated footballs and whether the Patriots cheated. This data set looks at Twitter sentiment on important days during the scandal to gauge public sentiment about the whole ordeal. We wrote about it here.
Added: January 25, 2015 by CrowdFlower | Data Rows: 11,814
A data set listing the sports that have been on the cover of Sports Illustrated since 1955. Covers are grouped by year. You can see the related blog post here.
Added: February 12, 2015 by CrowdFlower | Data Rows: 32,000
A look into what skills data scientists need and what programs they use. A part of our 2015 data scientist report which you can download.
Added: January 25, 2015 by CrowdFlower | Data Rows: 974
Contributors were given a nonce word and a real word, for example, "leebaf" and "iguana." They were given a sentence with the nonce word in it and asked to note how related the nonce word and real word were.
Here's a sample question: "Large numbers of leebaf skins are exported to Latin America to be made into handbags, shoes and watch straps."
Contributors then ranked the relation of "leebaf" to "iguana" on a scale of 1-5, from completely unrelated to very strongly related, respectively.
Added: December 01, 2014 by CrowdFlower | Data Rows: 300
A data set where business names were matched with URLs/homepages for the named businesses.
Contributors were asked to visit a provided website and determine if the site matched a given company name. They then categorized the businesses according to the following criteria:
Data set includes the given company name, URL, and categorization of each business.
Added: July 15, 2014 by CrowdFlower | Data Rows: 7,152
A Twitter sentiment analysis of users' 2015 New Year's resolutions. Contains demographic and geographical data of users and resolution categorizations. We wrote about it and produced an infographic here.
Added: January 03, 2015 by CrowdFlower | Data Rows: 5,011
Contributors read an app description, then selected the app's functionality from a pre-chosen list. Functionalities ranged from SMS to flashlight to weather to whether or not they used a phone's contacts. Contributors were allowed to select as many functionalities as applied for each app.
Data set includes a variety of applications and their selected functionalities.
Added: April 11, 2014 by CrowdFlower | Data Rows: 1,898
Contributors viewed two rather bizarre looking images and were asked which was more "natural." Images were all computer generated faces of people in various states of oddness.
Added: December 03, 2014 by CrowdFlower | Data Rows: 600
A large data set containing the official URLs of United States national and state parks.
Added: June 14, 2014 by CrowdFlower | Data Rows: 323
Dataset of 4000 crowd-named colors in 9 languages. Includes the RGB color, the native language color, and the translated color
Added: November 13, 2013 by Dave Oleson | Data Rows: 4,000
This dataset is a collection of tweets related to nuclear energy along with the crowd's evaluation of the tweet's sentiment. The possible sentiment categories are: "Positive", "Negative", "Neutral / author is just sharing information", "Tweet NOT related to nuclear energy", and "I can't tell". We also provide an estimation of the crowds' confidence that each category is correct which can be used to identify tweets whose sentiment may be unclear.
Added: August 30, 2013 by CrowdFlower | Data Rows: 190
This dataset is a collection of English sentence pairs. The crowd was asked about the truth value of the second sentence if the first sentence were true and to what extent the sentences are related on a scale of 1 to 5. The variance of this score over the crowd's judgments is included as well.
Added: August 30, 2013 by Marco Baroni | Data Rows: 555
Contributors were asked to evaluate how similar are two sets of words on a seven point scale with 1 being "completely different" and 7 being "exactly the same". The pairs are
Added: August 30, 2013 by Marco Baroni | Data Rows: 6,274
This job has the contributor judge how a tweet is related to yogurt using these possible options:
1) Eating-mentions what they're eating or what they like (dislike) eating.
2) Health-mentions health aspects of yogurt.
3) Part of a story-"Little Johnny has yogurt all over his face. How cute!"
4) Deal-such as a coupon or special pricing.
5) Cause-such as something for breast cancer or veterans.
6) Advertising-mentions an ad about yogurt.
7) Slang - "yogurt" used sometimes as slang. Check out Urban Dictionary if you want more on this.
8) I can't tell-please pick this if you think the tweet might fall into a category, however, you just aren't sure. It may be that you would need to see the linked story, or a tweet that this one responds to, etc.
Then a checkbox of possible options if a special style of yogurt is mentioned:
frozen
smoothie
snack bar
none

Contributors evaluated tweets for belief in the existence of global warming or climate change. The possible answers were "Yes" if the tweet suggests global warming is occuring, "No" if the tweet suggests global warming is not occuring, and "I can't tell" if the wtweet is ambiguous or unrelated to global warming. We also provide a confidence score for the classification of each tweet.
Added: August 30, 2013 by Kent Cavender-Bares | Data Rows: 6,090
Contributors evaluated tweets about multiple brands and products. The crowd was asked if the tweet expressed positive, negative or no emotion towards a brand and/or product. If some emotion was expressed they were also asked to say which brand or product was the target of that emotion.
Added: August 30, 2013 by Kent Cavender-Bares | Data Rows: 9,093
This dataset has all tweets that mention Claritin for October, 2012. The tweets are tagged with sentiment, the author's gender, and whether or not they mention any of the top 10 adverse events reported to the FDA. You can see a visualization of the full dataset here: https://senti.crowdflower.com/datasets/857/t/91f9ed1ab4281adf. For a fuller description of the dataset, see here: https://crowdflower.com/blog/2013/03/discovering-drug-side-effects-with-crowdsourcing/.
Added: November 13, 2013 by Dave Oleson | Data Rows: 4,900


