Pew Internet Research put out an analysis of Twitter conversation and compared that to its own public opinion polling. The results of their analysis? From the headline, “Twitter Reaction to Events Often at Odds with Overall Public Opinion.”

Pew’s search and analysis parter, Crimson Hexagon, took a three-day sample of tweets which contained words or phrases relevant to a given hot-button news item and analyzed them for positive or negative terms as described in their methodology:

The data on Twitter comes from an analysis of all publicly available Tweets. The time period for each event varied, but none included more than three days worth of reaction. For each subject, multiple search terms were used to identify appropriate tweets. For example, to find messages commenting on President Obama’s 2013 State of the Union Speech, Tweets were included if they appeared in the four hours following the start of his speech and used the words “state” and “union,” or “Obama,” or “SOTU.”

Unlike most human coding, CH does not measure each post as a unit, but examines the entire discussion in the aggregate. To do that, the algorithm breaks up all relevant texts into subsections. Rather than dividing each Tweet, paragraph, sentence or word, CH treats the “assertion” as the unit of measurement. If 40% of a story fits into one category, and 60% fits into another, the software will divide the text accordingly. Consequently, the results are not expressed in percent of Tweets, but rather the percent of assertions out of the entire body of stories identified by the original Boolean search terms.

But while we can argue about the efficacy of their methods (more on that later), the media seems to be willfully getting the results wrong. Check out a quick sample of the headlines:

Sample of conservative reactions by Twitterverse, at odds with the Daily Caller’s miopic understanding of reality. Source: Pew

This list even includes a majority of tech-savvy websites. The Daily Caller (ever the picture of reliable reportage) even took to interpreting the report as calling Twitter “a liberal, miopic, negative place.” This, despite the fact that the report clearly says that the Twitterverse occasionally breaks Conservative when public sentiment is Liberal. But there is a big difference between opinion on Twitter being “at odds” with general public opinion and not being a “reliable” indicator.

For a start, when 16% of Americans all share a common demographic bond – our affinity for Twitter – it should not be at all surprising that we share a common set of opinions. Neither should it be surprising that those opinions differ from a wider sample of the public.

Moreover, public opinion changes. It changes as people learn more about things and as facts present themselves. That very often takes more than three days for a lot of people. Twitter being heavily weighted to breaking news, tweeps have a tendency to be ahead of the curve.

We tweeps tend to “watch” the news unfold more or less together in real-time, so social reaction must also play its part. Twitter users have also been shown to be “influencers,” meaning we tend to voice our opinions to our friends more often than the average bear, you might say. It would be interesting to do the same sample, three days after a news break and then the following three days, to see if there is any change in the dichotomy between popular and Twitter sentiment.

But all of this presumes that Pew’s research is accurate. This is a very dicey affair, as indeed all public opinion polling is. But in this case, instead of speaking directly with tweeps, they’re using aggregation and analysis software to decide what is “positive” vs. “negative” or “conservative” vs. “liberal.” We are nowhere near a level of confidence in “Big Data” analysis of this type to consider this analysis anything other than hugely questionable.

The algorithms Crimson Hexagon uses would need to interpret tweets according to whether or not they’re really relevant to a given topic, whether the tweet was being sarcastic or some other form of humor, and whether the “negative” words are a function of genuine negativity or simple a reflection of language. Buffalo alone would be enough to give coders cold sweats, trying to interpret all that negativity.

And of course, it needs to be pointed out: Pew’s opinion polls do not reflect public sentiment any more accurately than Twitter, simply because Pew says they do. I am a big fan of Pew’s work – I cite it a lot, especially on (irony alert) Twitter. But by no means does this study reflect any kind of scientific fidelity.