Data for Everyone

Our Data for Everyone library is a collection of our favorite open data jobs that have come through our platform. They're available free of charge for the community, forever.

Transcriptions of names from handwriting

Transcriptions of names from handwriting

This dataset contains links to images of handwritten names along with human contributors’ transcription of these written names. Over 125,000 examples of first or last names. Most names are French, making this dataset of particular interest for work on dealing with accent marks in handwritten character recognition.

Added: July 15, 2016 by CrowdFlower | Data Rows: 129989 Download Now

Sentiment Analysis: Emotion in Text

Sentiment Analysis: Emotion in Text

In a variation on the popular task of sentiment analysis, this dataset contains labels for the emotional content (such as happiness, sadness, and anger) of texts. Hundreds to thousands of examples across 13 labels. A subset of this data is used in an experiment we uploaded to Microsoft’s Cortana Intelligence Gallery.

Added: July 15, 2016 by CrowdFlower | Data Rows: 40000 Download Now

Image categorization: dress patterns

Image categorization: dress patterns

This dataset contains links to images of women’s dresses, and the corresponding images are categorized into 17 different pattern types. Most pattern categories have hundreds to thousands of examples.

Added: July 15, 2016 by CrowdFlower | Data Rows: 15702 Download Now

Numerical Transcription from Images

Numerical Transcription from Images

Contributors looked at a series of pictures from a footrace and transcribed bib numbers of the competitors. Some images contain multiple bib numbers or incomplete bib numbers.

Added: December 8, 2015 by CrowdFlower | Data Rows: 7665 Download Now

Football Strategy

Football Strategy

Contributors were presented a football scenario and asked to note what the best coaching decision would be. A scenario: “It is third down and 3. The ball is on your opponent’s 20-yard line. There are five seconds left. You are down by 4.” The decisions presented were punt, pass, run, kick a field goal, kneel down, or don’t know. There are thousands of such scenarios in this job.

Added: December 8, 2015 by CrowdFlower | Data Rows: 3731 Download Now

Economic News Article Tone and Relevance

Economic News Article Tone and Relevance

Contributors read snippets of news articles. They then noted if the article was relevant to the US economy and, if so, what the tone of the article was. Tone was judged on a 9 point scale (from 1 to 9, with 1 representing the most negativity). Dataset contains these judgments as well as the dates, source titles, and text. Dates range from 1951 to 2014.

Added: December 8, 2015 by CrowdFlower | Data Rows: 8000 Download Now

Identifying key phrases in text

Identifying key phrases in text

Contributors looked at question/answer pairs like, “When did Bob Marley die? 1981,” and a series of sentences surrounding that event such as phrases from a Bob Marley obituary. They marked which of those sentences spoke directly to the question such as, “Robert Nesta ‘Bob’ Marley, OM (6 February 1945 – 11 May 1981) was a Jamaican reggae singer, song writer, musician,” for a wide variety of topics.

Added: December 7, 2015 by CrowdFlower | Data Rows: 8262 Download Now

Gender classifier data

Gender classifier data

This data set was used to train a CrowdFlower AI gender predictor. You can read all about the project here. Contributors were asked to simply view a Twitter profile and judge whether the user was a male, a female, or a brand (non-individual). The dataset contains 20,000 rows, each with a user name, a random tweet, account profile and image, location, and even link and sidebar color.

Added: November 15, 2015 by CrowdFlower | Data Rows: 20000 Download Now

Sentiment analyses of single words or short phrases

Sentiment analyses of single words or short phrases

Contributors looked at four words or bigrams (bigrams are just word pairs) and ranked the most positive and most negative ones in each set. For example, they saw quartets like “nasty, failure, honored, females” and chose which word was the most positive and most negative. Interestingly, each set was graded by eight contributors instead of the usual three. Dataset contains all 3,523 rows, but has 28K judgments.

Added: October 19, 2015 by CrowdFlower | Data Rows: 3523 Download Now

Disasters on social media

Disasters on social media

Contributors looked at over 10,000 tweets culled with a variety of searches like “ablaze”, “quarantine”, and “pandemonium”, then noted whether the tweet referred to a disaster event (as opposed to a joke with the word or a movie review or something non-disastrous).

Added: September 4, 2015 by CrowdFlower | Data Rows: 10877 Download Now

Do these chemicals contribute to a disease?

Do these chemicals contribute to a disease?

Contributors read sentences in which both a chemical (like Aspirin) and a disease (or side-effect) were present. They then determined if the chemical directly contributed to the disease or caused it. Dataset includes chemical names, disease name, and aggregated judgments of five (as opposed to the usual three) contributors.

Added: August 18, 2015 by CrowdFlower | Data Rows: 5610 Download Now

First GOP debate sentiment analysis

First GOP debate sentiment analysis

We looked through tens of thousands of tweets about the early August GOP debate in Ohio and asked contributors to do both sentiment analysis and data categorization. Contributors were asked if the tweet was relevant, which candidate was mentioned, what subject was mentioned, and then what the sentiment was for a given tweet. We’ve removed the non-relevant messages from the uploaded dataset.

Added: August 11, 2015 by CrowdFlower | Data Rows: 14000 Download Now

Classification of political social media

Classification of political social media

Contributors looked at thousands of social media messages from US Senators and other American politicians to classify their content. Messages were broken down into audience (national or the tweeter’s constituency), bias (neutral/bipartisan, or biased/partisan), and finally tagged as the actual substance of the message itself (options ranged from informational, announcement of a media appearance, an attack on another candidate, etc.)

Added: August 5, 2015 by CrowdFlower | Data Rows: 5000 Download Now

URL categorization

URL categorization

To create this large, enriched dataset of categorized websites, contributors clicked provided links and selected a main and sub-category for URLs. The 31,000+ sites are in a variety of languages and have been split into the following main categories (with each having multiple sub-categories as well):

Adult Finance News & Media
Arts & Entertainment Food & Drink People & Society
Automotive Gambling Pets & Animals
Beauty & Fitness Games Reference
Books & Literature Health Science
Business & Industry Home & Garden Shopping
Career & Education Internet & Telecom Sports
Computer & Electronics Law & Government Travel

Added: August 5, 2015 by CrowdFlower | Data Rows: 31085 Download Now

eCommerce search relevance

eCommerce search relevance

We used this dataset to launch our Kaggle competition, but the set posted here contains far more information than what served as the foundation for that contest. This set contains image URLs, rank on page, description for each product, search query that lead to each result, and more, each from five major English-language ecommerce sites.

Added: July 28, 2015 by CrowdFlower | Data Rows: 32000 Download Now

Housing and wheelchair accessibility

Housing and wheelchair accessibility

Here, contributors viewed 10,000 Google maps images and marked whether they were residential areas. If they were, they noted which homes were most prevalent in the area (apartments or houses) and whether the area had proper sidewalks that are wheelchair friendly.

Added: July 14, 2015 by CrowdFlower | Data Rows: 10000 Download Now

U.S. economic performance based on news articles

U.S. economic performance based on news articles

Contributors viewed a new article headline and a short, bolded excerpt of a sentence or two from the attendant article. Next, they decided if the sentence in question provided an indication of the U.S. economy’s health, then rated the indication on a scale of 1-9, with 1 being negative and 9 being positive.

Added: June 25, 2015 by CrowdFlower | Data Rows: 5000 Download Now

Primary emotions of statements

Primary emotions of statements

Contributors looked at a single sentence and rated its emotional content based on Plutchik’s wheel of emotions. 18 emotional choices were presented to contributors for grading.

Some researchers may find the full report, with non-aggregated responses, to be of interest. The agg report can be downloaded with the button to the right, while the full report can be downloaded by clicking this link.

Added: June 25, 2015 by CrowdFlower | Data Rows: 2400 Download Now

Police-involved fatalities since May 2013

Police-involved fatalities since May 2013

A data categorization job where contributors compiled a database of police-involved shootings over a two-year span. Information contained includes: race, gender, city, state, whether the victim was armed, photos of the deceased, attending news stories, and more.

Added: June 17, 2015 by CrowdFlower | Data Rows: 2355 Download Now

Twitter sentiment analysis: Self-driving cars

Twitter sentiment analysis: Self-driving cars

A simple Twitter sentiment analysis job where contributors read tweets and classified them as very positive, slightly positive, neutral, slightly negative, or very negative. They were also prompted asked to mark if the tweet was not relevant to self-driving cars.

Added: June 8, 2015 by CrowdFlower | Data Rows: 7015 Download Now

Comparing pictures of people

Comparing pictures of people

In this job, contributors viewed two pictures of people walking through the same room and were then asked to compare the person on the left to the person on the right. Questions center on observable traits (like skin color, hair length, muscularity, etc.). An example:

For “Weight”, the person on the left is:

  • Much more heavy
  • More heavy
  • Same
  • More light
  • Much more light

Data set contains nearly 60,000 rows of judgments; all images are relative URLs based on the following structure http://users.ecs.soton.ac.uk/dmc1g14/biot/frames/[camera]/[img_path]

Added: June 8, 2015 by CrowdFlower | Data Rows: 59476 Download Now

Wikipedia image categorization

Wikipedia image categorization

This data set contains hundreds of Wikipedia images which contributors categorized in the following ways:

  • No person present
  • One person present
  • Several people present, but one dominant
  • Several people present, but none are dominant
  • Unsure

If the images were of one or several people, contributors further classified images by gender.

Added: May 22, 2015 by CrowdFlower | Data Rows: 976 Download Now

Government official database

Government official database

A simple data categorization job wherein contributors viewed a cabinet member, minister, ambassador, etc., and separated their names from their titles. Data set contains names, positions, and years served.

Added: May 22, 2015 by CrowdFlower | Data Rows: 5000 Download Now

Blockbuster database

Blockbuster database

A data categorization job where we asked the crowd to find out information about the ten most popular movies, each year, for the past 40 years (1975-2015). Dataset includes:

  • Movie titles
  • Poster URLs for each
  • Genre information
  • Run time
  • MPAA ratings
  • IMDB rating
  • Rotten Tomato audience/critic rating
  • Box office receipts (adjusted for inflation)

Added: May 22, 2015 by CrowdFlower | Data Rows: 410 Download Now

Progressive issues sentiment analysis

Progressive issues sentiment analysis

Contributors viewed tweets regarding a variety of left-leaning issues like legalization of abortion, feminism, Hillary Clinton, etc. They then classified if the tweets in question were for, against, or neutral on the issue (with an option for none of the above). After this, they further classified each statement as to whether they expressed a subjective opinion or gave facts.

Added: May 13, 2015 by CrowdFlower | Data Rows: 1159 Download Now

Mobile search relevance

Mobile search relevance

Contributors viewed a variety of searches for mobile apps and determined if the intent of those searches was matched. One was a short query like “music player”; the other, a much longer one like “I would like to download an app that plays the music on the phone from multiple sources like Spotify and Pandora and my library.”

Added: May 13, 2015 by CrowdFlower | Data Rows: 647 Download Now

Image attribute tagging

Image attribute tagging

Contributors viewed thousands of images and categorized each based on a given list of attributes. These attributes ranged from objective and specific (like “child” or “motorbike”) to more subjective ones (like “afraid” or “beautiful”). Data set includes URLs for all images, multiple tags for each, and contributor agreement scores.

Added: May 13, 2015 by CrowdFlower | Data Rows: 3235 Download Now

Drug relation database

Drug relation database

Contributors read color-coded sentences and determined what the relationship of a drug was to certain symptoms or diseases. There are two types of relationships. A drug either:

  • Caused side effects – [Drug] gave me [symptom]
  • Was effective against a condition – [Drug] helped my [disease]
  • Is prescribed for a certain disease – [Drug] was given to help my [disease]
  • Is contraindicated in – [Drug] should not be taken if you have [disease or symptom]

The second similarity was more about the statement itself. Those broke down into:

  • Personal experiences – I started [drug] for [disease]
  • Personal experiences negated – [Drug] did not cause [symptom]
  • Impersonal experiences – I’ve heard [drug] causes [symptom]
  • Impersonal experiences negated – I’ve read [drug] doesn’t cause [symptom]
  • Question – Have you tried [drug]?

Added: May 7, 2015 by CrowdFlower | Data Rows: 2020 Download Now

Indian terrorism deaths database

Indian terrorism deaths database

Contributors read sentences from the South Asia Terrorism Portal and quantified them. Contributors counted the deaths mentioned in a sentence and whether they were terrorists, civilians, or security forces. Database contains original sentences, state and district in which the deaths occurred, dates of the deaths, and more. (Test questions have been removed from the database for ease of visualization.)

Added: May 7, 2015 by CrowdFlower | Data Rows: 27233 Download Now

Blurry image comparison

Blurry image comparison

Contributors viewed a pair of purposely blurry or saturated images. They were then asked which image more closely matched a particular word. Data set contains URLs for all images and image pairs, aggregated agreement scores, and variance amounts. Notably, a high number of contributors were polled for each image pairing (20 in total for each, giving this data set upwards of 10,000 judgments).

Added: April 20, 2015 by CrowdFlower | Data Rows: 511 Download Now

Objective truths of sentences/concept pairs

Objective truths of sentences/concept pairs

Contributors read a sentence with two concepts. For example “a dog is a kind of animal” or “captain can have the same meaning as master.” They were then asked if the sentence could be true and ranked it on a 1-5 scale. On the low end was “strongly disagree” and on the upper, “strongly agree.” Data set includes over 8,000 concept pairings, averaged agreement scores, and associated variances.

Added: April 17, 2015 by CrowdFlower | Data Rows: 8227 Download Now

Is-A linguistic relationships

Is-A linguistic relationships

Contributors were provided a pair of concepts in a constant sentence structure. Namely: [Noun 1] is a [noun 2]. They were then asked to simply note if this sentence was then true or false. Data set contains all nouns and aggregated T/F judgments.

Added: March 27, 2015 by CrowdFlower | Data Rows: 3297 Download Now

News article / Wikipedia page pairings

News article / Wikipedia page pairings

Contributors read a short article and were asked which of two Wikipedia articles it matched most closely. For example, a brief biography of Mel Gibson could be paired with Gibson’s general Wikipedia page or Lethal Weapon; likewise, Iran election results could be paired with a Wikipedia page on Iran in general or the 2009 protests. Data set contains URLs for both Wiki pages, the full text contributors read, and their judgments on each row.

Added: March 27, 2015 by CrowdFlower | Data Rows: 3000 Download Now

Free text object descriptions

Free text object descriptions

Contributors viewed a pair of items and were asked to write sentences that described and differentiated the two objects. In other words, if viewing an apple and a orange, they could not write, “this is a piece of fruit,” twice, but needed to note how they were different. Image pairings varied so that the same image would appear in different couples and the second image was always smaller. Data set contains URLs of images and three sentences written per item, per image.

Added: March 27, 2015 by CrowdFlower | Data Rows: 1225 Download Now

Smart phone & tablet names database

Smart phone & tablet names database

Contributors viewed a particular model code (like C6730 or LGMS323), then searched for the name of the device itself (Kyocera C6730 Hydro or LG Optimus L70), then noted whether the device was a phone or tablet.

Added: March 27, 2015 by CrowdFlower | Data Rows: 1600 Download Now

Image sentiment polarity classification

Image sentiment polarity classification

This data set contains over fifteen thousand sentiment-scored images. Contributors were shown a variety of pictures (everything from portraits of celebrities to landscapes to stock photography) and asked to score the images on typical positive/negative sentiment. Data set contains URL of images, sentiment scores of highly positive, positive, neutral, negative, and highly negative, and contributor agreement.

Added: March 27, 2015 by CrowdFlower | Data Rows: 15613 Download Now

Image classification: People and food

Image classification: People and food

A collection of images of people eating fruits, cakes, and other foodstuffs. Contributors classified the images by male/female, then by age (adult or child/teenager).

Added: March 9, 2015 by CrowdFlower | Data Rows: 587 Download Now

Weather sentiment evaluated

Weather sentiment evaluated

Here, contributors were asked if the crowd graded the sentiment of a particular tweet relating to the weather correctly. The original job (above this one, called simply “Weather sentiment”) involved 20 contributors noting the sentiment of weather-related tweets. In this job, we asked 10 contributors to check that original sentiment evaluation for accuracy.

The button to the right is the aggregated data set. You can also download the non-aggregated, full data set.

Added: March 9, 2015 by CrowdFlower | Data Rows: 1000 Download Now

Weather sentiment

Weather sentiment

This job is best when viewed with the following job, “Weather sentiment evaluated.”

Here, contributors were asked to grade the sentiment of a particular tweet relating to the weather. The catch is that 20 contributors graded each tweet. We then ran an additional job (the one below) where we asked 10 contributors to grade the original sentiment evaluation.

The button to the right is the aggregated data set. You can also download the non-aggregated, full data set.

Added: March 9, 2015 by CrowdFlower | Data Rows: 1000 Download Now

McDonald’s review sentiment

McDonald’s review sentiment

A sentiment analysis of negative McDonald’s reviews. Contributors were given reviews culled from low-rated McDonald’s from random metro areas and asked to classify why the locations received low reviews. Options given were:

  • Rude Service
  • Slow Service
  • Problem with Order
  • Bad Food
  • Bad Neighborhood
  • Dirty Location
  • Cost
  • Missing Item

Added: March 6, 2015 by CrowdFlower | Data Rows: 1500 Download Now

The colors of #TheDress

The colors of #TheDress

On February 27th, 2015, the internet was briefly obsessed with the color of a dress known simply as #TheDress. We ran a survey job to 1000 contributors and asked them what colors the dress was, as well as looked into a hypothesis that Night Owls and Morning People saw the dress differently. We wrote about it here.

Added: February 27, 2015 by CrowdFlower | Data Rows: 1000 Download Now

Corporate messaging

Corporate messaging

A data categorization job concerning what corporations actually talk about on social media. Contributors were asked to classify statements as information (objective statements about the company or it’s activities), dialog (replies to users, etc.), or action (messages that ask for votes or ask users to click on links, etc.).

Added: February 14, 2015 by CrowdFlower | Data Rows: 3118 Download Now

Sports Illustrated covers

Sports Illustrated covers

A data set listing the sports that have been on the cover of Sports Illustrated since 1955. Covers are grouped by year. You can see the related blog post here.

Added: February 12, 2015 by CrowdFlower | Data Rows: 32000 Download Now

Airline Twitter sentiment

Airline Twitter sentiment

A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as “late flight” or “rude service”).

You can download the non-aggregated results (55,000 rows) here.

Added: February 12, 2015 by CrowdFlower | Data Rows: 16000 Download Now

Language: Certainty of Events

Language: Certainty of Events

A linguistic data set concerning the certainty an author has about a specific word. For example, in the following sentence: “The dog ran out the door,” if the word “ran” was asked about, the certainty that the event did or will happen would be high.

Added: February 6, 2015 by CrowdFlower | Data Rows: 13386 Download Now

Coachella 2015 Twitter sentiment

Coachella 2015 Twitter sentiment

A sentiment analysis job about the lineup of Coachella 2015. We wrote about it here. An additional, thousand-row data set about which artists fans were most excited about can be found here. The button to the right concerns sentiment about the festival overall.

Added: February 4, 2015 by CrowdFlower | Data Rows: 3847 Download Now

Body part relationships

Body part relationships

A data set where contributors classified if certain body parts were part of other parts. Questions were phrased like so: “[Part 1] is a part of [part 2],” or, by way of example, “Nose is a part of spine” or “Ear is a part of head.”

Added: February 4, 2015 by CrowdFlower | Data Rows: 1892 Download Now

Sound detection and classification

Sound detection and classification

Contributors listened to short audio clips and identified white noise events like coughing, dropped keys, and barking dogs. They also tried to identify the scene, such as office, cafe, or supermarket and ranked the difficulty of each individual row. Audio clips range from about five to ten seconds.

Added: February 1, 2015 by CrowdFlower | Data Rows: 8000 Download Now

Wearable technology database

Wearable technology database

A data set containing information on hundreds of wearables. Contains data on prices, company name and location, URLs for all wearables, as well as the location of the body on which the wearable is worn.

Added: January 31, 2015 by CrowdFlower | Data Rows: 582 Download Now

Relevancy of terms to a disaster relief topic

Relevancy of terms to a disaster relief topic

Contributors viewed a topic and a term and rated the relevancy of the latter to the former on a five point scale (1 being very irrelevant, 5 being very relevant). The topics all center around humanitarian aid or disaster relief and each topic was defined for contributors. They were also asked if the term was a specific person or place and whether it was misspelled.

Added: January 28, 2015 by CrowdFlower | Data Rows: 7566 Download Now

The data behind data scientists

The data behind data scientists

A look into what skills data scientists need and what programs they use. A part of our 2015 data scientist report which you can download.

Added: January 25, 2015 by CrowdFlower | Data Rows: 974 Download Now

New England Patriots Deflategate sentiment

New England Patriots Deflategate sentiment

Before the 2015 Super Bowl, there was a great deal of chatter around deflated footballs and whether the Patriots cheated. This data set looks at Twitter sentiment on important days during the scandal to gauge public sentiment about the whole ordeal. We wrote about it here.

Added: January 25, 2015 by CrowdFlower | Data Rows: 11814 Download Now

Biomedical image modality

Biomedical image modality

A large data set of labeled biomedical images, ranging from x-ray and ultrasound to charts, graphs, and even hand-drawn sketches. The major image categories are as follows:

  • Radiology (MRIs, X-rays, etc.)
  • Visible light photography (pictures of skin, organs, etc.)
  • Printed signals and waves (electromyography, etc.)
  • Microscopy (various microscopic images)
  • Generic biomed illustrations (tables, charts, graphs, sketches, etc.)

Data set includes the image class, if a given categorization was accurate, and a URL to the image judged

Added: January 22, 2015 by CrowdFlower | Data Rows: 10652 Download Now

Hate speech identification

Hate speech identification

Contributors viewed short text and identified if it a) contained hate speech, b) was offensive but without hate speech, or c) was not offensive at all. Contains nearly 15K rows with three contributor judgments per text string.

Added: January 11, 2015 by CrowdFlower | Data Rows: 14442 Download Now

2015 New Year’s resolutions

2015 New Year’s resolutions

A Twitter sentiment analysis of users’ 2015 New Year’s resolutions. Contains demographic and geographical data of users and resolution categorizations. We wrote about it and produced an infographic here.

Added: January 3, 2015 by CrowdFlower | Data Rows: 5011 Download Now

Apple Computers Twitter sentiment

Apple Computers Twitter sentiment

A look into the sentiment around Apple, based on tweets containing #AAPL, @apple, etc.

Contributors were given a tweet and asked whether the user was positive, negative, or neutral about Apple. (They were also allowed to mark “the tweet is not about the company Apple, Inc.)

Tweets cover a wide array of topics including stock performance, new products, IP lawsuits, customer service at Apple stores, etc.

Added: December 28, 2014 by CrowdFlower | Data Rows: 3969 Download Now

“All oranges are lemons,” a.k.a. Semantic relationships between two concepts

“All oranges are lemons,” a.k.a. Semantic relationships between two concepts

An interesting language data set about the relationship of broad concepts.

All questions were phrased in the following way: “All [x] are [y].” For example, a contributor would see something like “All Toyotas are vehicles” and were then asked to say whether this claim was true or false. Contributors were also provided images, in case they were unclear as to what either concept is.

This data set includes links to both images provided, the names given for [x] and [y], and whether the statement that “All [x] are [y]” was true or false.

Added: December 22, 2014 by CrowdFlower | Data Rows: 3536 Download Now

Naturalness of computer generated images

Naturalness of computer generated images

Contributors viewed two rather bizarre looking images and were asked which was more “natural.” Images were all computer generated faces of people in various states of oddness.

Added: December 3, 2014 by CrowdFlower | Data Rows: 600 Download Now

Judge the relatedness of familiar words and made-up ones

Judge the relatedness of familiar words and made-up ones

Contributors were given a nonce word and a real word, for example, “leebaf” and “iguana.” They were given a sentence with the nonce word in it and asked to note how related the nonce word and real word were.

Here’s a sample question: “Large numbers of leebaf skins are exported to Latin America to be made into handbags, shoes and watch straps.”

Contributors then ranked the relation of “leebaf” to “iguana” on a scale of 1-5, from completely unrelated to very strongly related, respectively.

Added: December 1, 2014 by CrowdFlower | Data Rows: 300 Download Now

Gender breakdown of Time Magazine covers

Gender breakdown of Time Magazine covers

A year-by-year breakdown of the cover images to Time Magazine. Referenced in this blog post/

Contributors were shown images of Time Magazine covers since the late 1920s and asked to classify if the person was male or female. Data is broken down overall and on an annual basis.

Added: December 1, 2014 by CrowdFlower | Data Rows: 100 Download Now

How beautiful is this image? (Part 3: Animals)

How beautiful is this image? (Part 3: Animals)

Here, contributors were asked to rate image quality (as opposed to how adorable the animals in the images actually are). They were given a five-point scale, from “unacceptable” (blurry photos of pets) to “exceptional” (hi-res photos that might appear in text books or magazines) and ranked a series of images based on that criteria.

Data set includes a URL for each image, an averaged score (of 1-5) for image quality, and a variance rating accounting for subjective, contributor disagreements.

Added: October 15, 2014 by CrowdFlower | Data Rows: 3500 Download Now

How beautiful is this image? (Part 2: Buildings and Architecture)

How beautiful is this image? (Part 2: Buildings and Architecture)

Here, contributors were asked to rate image quality (as opposed to how gorgeous the buildings in the images actually are). They were given a five-point scale, from “unacceptable” (out-of-focus cityscapes) to “exceptional” (hi-res photos that might appear in a city guide book) and ranked a series of images based on that criteria.

Data set includes a URL for each image, an averaged score (of 1-5) for image quality, and a variance rating accounting for subjective, contributor disagreements.

Added: October 13, 2014 by CrowdFlower | Data Rows: 3500 Download Now

Company categorizations (with URLs)

Company categorizations (with URLs)

A data set where business names were matched with URLs/homepages for the named businesses.

Contributors were asked to visit a provided website and determine if the site matched a given company name. They then categorized the businesses according to the following criteria:

  • Automotive
  • Consumer Packaged Goods
  • Financial Services
  • Retail
  • Travel
  • Other

Data set includes the given company name, URL, and categorization of each business. You can download the non-aggregated data set here or the aggregated one by clicking the download button to the right.

Added: July 15, 2014 by CrowdFlower | Data Rows: 7152 Download Now

National Park locations

National Park locations

A large data set containing the official URLs of United States national and state parks.

Added: June 14, 2014 by CrowdFlower | Data Rows: 323 Download Now

Smart phone app functionality

Smart phone app functionality

Contributors read an app description, then selected the app’s functionality from a pre-chosen list. Functionalities ranged from SMS to flashlight to weather to whether or not they used a phone’s contacts. Contributors were allowed to select as many functionalities as applied for each app. Data set includes a variety of applications and their selected functionalities.

Added: April 11, 2014 by CrowdFlower | Data Rows: 1898 Download Now

Agreement between long and short sentences

Agreement between long and short sentences

Contributors were asked to read two sentences (the first was an image caption and the second was a shorter version) and judge whether the short sentence adequately describes the event in the first sentence (image caption).

An example might be something like:

Caption: A man with earphones working in a San Francisco cafe while drinking coffee.

Proposed sentence: Man listening to music in coffee shop

These sentences would qualify as pairs.

Data set includes sentence pairs, contributor agreement scores, and yes/no/unknown rankings.

Added: February 23, 2014 by CrowdFlower | Data Rows: 2000 Download Now

Academy Awards demographics

Academy Awards demographics

A data set concerning the race, religion, age, and other demographic details of all Oscar winners since 1928 in the following categories:

  • Best Actor
  • Best Actress
  • Best Supporting Actor
  • Best Supporting Actress
  • Best Director

For further information on this data set, please read our resulting blog post.

Added: February 15, 2014 by CrowdFlower | Data Rows: 416 Download Now

Sentence plausibility

Sentence plausibility

Contributors read strange sentences and ranked them on a scale of “implausible” (1) to “plausible” (5). Sentences were phrased in the following manner: “This is not an [x], it is a [y].”

Added: January 23, 2014 by CrowdFlower | Data Rows: 400 Download Now

Claritin Twitter

Claritin Twitter

This dataset has all  tweets that mention Claritin for October, 2012.  The tweets are tagged with sentiment, the author’s gender, and whether or not they mention any of the top 10 adverse events reported to the FDA.  You can see a visualization of the full dataset here: https://senti.crowdflower.com/datasets/857/t/91f9ed1ab4281adf.  For a fuller description of the dataset, see here: 
https://crowdflower.com/blog/2013/03/discovering-drug-side-effects-with-crowdsourcing/.

Added: November 13, 2013 by Dave Oleson | Data Rows: 4900 Download Now

Colors in 9 Languages

Colors in 9 Languages

Dataset of 4000 crowd-named colors in 9 languages.  Includes the RGB color, the native language color, and the translated color

Added: November 13, 2013 by CrowdFlower | Data Rows: 4000 Download Now

Judge Emotion About Brands & Products

Judge Emotion About Brands & Products

Contributors evaluated tweets about multiple brands and products. The crowd was asked if the tweet expressed positive, negative, or no emotion towards a brand and/or product. If some emotion was expressed they were also asked to say which brand or product was the target of that emotion.

Added: August 30, 2013 by Kent Cavender-Bares | Data Rows: 9093 Download Now

Sentiment Analysis – Global Warming/Climate Change

Sentiment Analysis – Global Warming/Climate Change

Contributors evaluated tweets for belief in the existence of global warming or climate change. The possible answers were “Yes” if the tweet suggests global warming is occurring, “No” if the tweet suggests global warming is not occurring, and “I can’t tell” if the tweet is ambiguous or unrelated to global warming. We also provide a confidence score for the classification of each tweet.

Added: August 30, 2013 by Kent Cavender-Bares | Data Rows: 6090 Download Now

Similarity judgment of word combinations

Similarity judgment of word combinations

Contributors were asked to evaluate how similar are two sets of words on a seven point scale with 1 being “completely different” and 7 being “exactly the same.”

Added: August 30, 2013 by Marco Baroni | Data Rows: 6274 Download Now

Decide whether two English sentences are related

Decide whether two English sentences are related

This dataset is a collection of English sentence pairs. The crowd was asked about the truth value of the second sentence if the first sentence were true and to what extent the sentences are related on a scale of 1 to 5. The variance of this score over the crowd’s judgments is included as well.

Added: August 30, 2013 by Marco Baroni | Data Rows: 555 Download Now

Judge emotions about nuclear energy from Twitter

Judge emotions about nuclear energy from Twitter

This dataset is a collection of tweets related to nuclear energy along with the crowd’s evaluation of the tweet’s sentiment. The possible sentiment categories are: “Positive,” “Negative,” “Neutral/author is just sharing information,” “Tweet NOT related to nuclear energy,” and “I can’t tell.” We also provide an estimation of the crowds’ confidence that each category is correct which can be used to identify tweets whose sentiment may be unclear.

Added: August 30, 2013 by CrowdFlower | Data Rows: 190 Download Now

Image descriptions

Image descriptions

Contributors were shown a large variety of images and asked whether a given word described the image shown. For example, they might see a picture of Mickey Mouse and the word Disneyland, where they’d mark “yes.” Conversely, if Mickey Mouse’s pair word was “oatmeal,” they would mark no.

Data set includes image URLs, the matched word, whether the pair matched, and a confidence score for each.

Added: March 30, 2011 by CrowdFlower | Data Rows: 225000 Download Now