As anyone who follows political races knows, different sources can report the same event in very different ways. We took nearly six thousand recent articles over the past month about Clinton and Obama and sent them through a data enrichment workflow on Mechanical Turk to be classified as favorable or unfavorable for the respective candidates. We scraped the articles from Google News restricted to several sources, and threw in front page headlines from Digg.
Here is the graph for favorability scores, aggregated by source. We found that Digg was far and away the most favorable for Obama.
The next graph tracks overall news favorability by date. To provide some context, we compared it with the change in Obama stock on the Intrade prediction market.
More details after the jump:
We created our data set by doing two separate searches, one for “Barack Obama” and one for “Hillary Clinton”. This did a pretty good job ensuring that results from Google News or Digg’s search facility demonstrated how the article was about the given candidate. For each article, we showed the headline, search result snippet, and link to several Turkers. They reported whether it was positive, neutral, or negative toward the candidate.
The favorability metric was created by averaging the ratings across articles. Pro-Obama and anti-Hillary articles were both worth 1 point; anti-Obama and pro-Hillary both worth -1, and neutrals 0.
Therefore, if all articles are either positive towards Obama or negative towards Hillary, the rating is +100%; and vice-versa for -100%.
The data is very noisy. The question of favorability is extremely tricky: it includes a combination of expectations, sentiment, and the objective events a newspaper chooses to report. All of these are hard to reliably assess or even define. (And whether anything you measure constitutes “media bias” is another complicated question!)
Despite all this philosophical intractability, the data must be showing something real, because we have a statistically sound result: the difference between Digg and the others was statistically significant (t-test, p<.001). The differences within the mainstream media were not statistically significant.