For DD and Starbucks, stereotypes play out in online conversation

Our office here in Cambridge, MA is blessed with talent from all parts of the country.  As a result of our mixed geographical heritage, few topics are as hotly debated in the office this time of year as pro football and coffee chains.  After a recent debate on the latter topic in which every side claimed that ‘most people’ agreed with them, we decided to take a peek at the online conversation and see if our assumptions were reflected in reality.

Deciding to limit ourselves for the time-being to Dunkin’ Donuts and Starbucks, we analyzed over 230,000 blog and forum posts from June 1 to yesterday in an attempt to settle the debate whether internet’s perceptions of these brands matched our own.   We analyzed content every day, but summarized the results as the conversation was fairly stable  over that time.  Dunkin’ Donuts, as a mainly regional player had about 10%  of the volume of its much larger peer.

As those who have grown up in the Northeast know, DD is much-beloved local institution and 14% of the conversation were  general exclamations of love.   Those who had moved away from the region and missed their local shop were especially passionate in their praise for the chain.    Matching our own experience in the office, a large number of people (17%) mentioned Dunkin’ as a regular fixture of their daily routine;  particularly for morning coffee runs.

Dunkin' Donuts

A number of Dunkin’ customers weren’t as satisfied with the experience of grabbing their coffee – complaints about sloppy service emerged quickly as a theme.  However, the regional devotion to DD’s syrupy take on coffee far outweighed the complaints.   We knew about the coffee but in training our analysis algorithm we were surprised at the amount of love for DD’s low-cal flatbread sandwiches and praise for their taste / calorie ratio.

Not everyone was impressed by DD’s commitment to their health, with 18% complaining about the caloric content of their baked goods.

Starbucks managed to avoid major health complaints despite Starbucks’ fattening drinks, which owned the cravings of many posters.   Their regular coffee, on the other hand, did not feature nearly as prominently or as well, with a category of negative posts specifically about the coffee and only a scattered few posts in praise of it (we rolled these compliments into the ‘Other Positive’ category.)  Despite the lack of enthusiasm about the basic coffee, a significant slice of the conversation mentioned Starbucks as part of daily routine as well.   For us, its position less than a block away trumps any concerns for pleasing our taste buds during our caffeine fixes.

Starbucks Conversation

Commenters also complained somewhat about the price of Starbucks’ offerings, long a symbol of casual decadence, but perhaps not as much as we expected.  An equivalent number of people felt that Starbucks was ‘evil’ on account of their scale, competitiveness with local business, and homogeneity. (The company is addressing this)  Not exactly shocking for us, as the DD proponents usually bring up the Austin Powers reference at least once an argument.

These complaints were minor compared to the biggest surprise to come out of the analysis:  that such a large number of Starbucks customers are vocal and satisfied with the shops’ ambiance (Good Experience.)  We read a number of posts by students who appreciated Starbucks as a study space and stressed out moms who sought it as a daily sanctuary.  The Seattle-based company has long emphasized this component of their brand, and it appears that their attention to the details of everything from the furniture to the music continues to pay dividends.

We haven’t managed to settle any internal debates here with this analysis, and no money changed hands on the side bets, but at least next time we’ll have proper statistics to hurl at one another.  Or possibly just stick to debating whether the Vikings are overrated.

Helping CNN.com Reveal Vox Twitter

Today CNN.com debuted what will become a regular segment… using Crimson Hexagon’s VoxTrot to understand what’s really happening in the conversation on Twitter.

Check it out, from about the 1:00 mark to 4:00…

Reblog this post [with Zemanta]

Innovative research & real-world solutions

IQSS Stairwell

Crimson Hexagon co-founder Gary King still heads IQSS over at Harvard — and some of his innovative research design is getting some interesting press after its publication in The Lancet.

King and his colleagues designed and led a study with about 500,000 people:  the largest-ever randomized health policy experiment. The study featured innovative research designs and statistical methods that make best use of available data in a cost-effective way.

What’s next? According to HarvardScience, the approach is now being implemented in, or considered for evaluations of — many other public policy programs around the world. The ability to understand what’s important and moving the public policy needle at a lower cost than other research methods is huge — particularly when you’re talking about delivering healthcare to the world’s poorest populations. As the petabyte age becomes a reality, innovative data-driven analysis becomes a necessity.

Big news for big data: Guardian’s Open Platform

Guardian_Open_Platform

The Guardian today announced its new Open Platform, and influential technology bloggers and analysts took notice. The Guardian is providing content and data APIs to enable and encourage developers to build third-party apps. Developers can monetize their creations with advertising, but will eventually be required to join an ad network.

As anyone who has picked up a local newspaper lately can attest, the death of print media is not greatly exaggerated, but a grim fact.  The Guardian seems committed to the kind of innovation that may help it to weather the storm.

From our perspective, it’s pretty amazing to see the much-heralded age of big and open data becoming a reality. Open access to news and data sets is sure to open the floodgates for insights from new methods of quantifying and visualizing reporting.

Searching v. characterizing; needle v. haystack

haystackLately Google has been adding features, such as its Preferred Sites or SearchWiki, that enable users to narrow in on the one result they want even better, or to promote or demote sites in their own future searches. These features will clearly help users find the needle in the haystack, but as they get better for this purpose, we should not expect them to also improve our ability to characterize a whole set of web documents. Here’s an example where searching for the needle can be limiting:  Academics are in the business of pushing forward the boundaries of knowledge. But if you don’t know where the boundaries of knowledge are, its easy to spend a great deal of time reinventing wheels and all manner of other existing technology.

So you’d think that the advent of search engines (including academic search engines like Google Scholar) would make our jobs much easier — and they undoubtedly do in some ways. But here’s the risk: there’s been some evidence lately that these search engines have caused academics to read and cite fewer articles (i.e., only those that appear at the top of search results) and for our articles to be less comprehensive overall.  There’s an ongoing debate about this evidence in academic and technology circles, but you can see how it might happen.  Search engines are about searching for one item; we shouldn’t expect them to be as good at describing the haystack as they are at serving up needles.

Photo credit: pierreyves0

Why Twitchboard matters

twitterCame across Twitchboard on today’s ReadWriteWeb post on The Rise of Cloud Agents. Twitchboard goes after the problem many web users are encountering: Humans just don’t scale. You sign up for a bunch of online services (my toolbar is littered with them) and then you have to remember you have them, use them, and, in an ideal world, integrate them.

    Services like Ping.fm do a terrific job helping you publish across multiple platforms, like updating your Twitter, Facebook, and LinkedIn statuses all at once. But Twitchboard takes it a level further by automating the interactions of these social web services.

    More from ReadWriteWeb:

    Blogger Chris Arkenberg says Twitchboard is a part of the “emerging class of cloud agents.” These cloud agents, as he describes them, will help us sort and search the massive volumes of data we interact with regularly. He envisions that soon we’ll have many of these cloud agents, swarming around us, working on our behalf, helping to parse the data flowing in and providing us with the information that we need, separated from the noise.

    The Apple app store not too long ago passed 300 million downloads, and yesterday we learned that even the lowly iFart is earning over $10,000 every day. The problem is not that there too few useful or amusing applications, but how to manage all the resulting data and their interactions. I, for one, welcome our new cloud agent overlords.

    How to avoid the flu … with data

    With a big chunk of the country getting hammered by snow, it’s definitely sinking in that the holidays have arrived. And along with all the joy and happiness the next few weeks entail, there’s the inevitable sneezing, sniffling, and general misery that more than a few of us will experience with the flu.  Luckily, last month researchers at Google, after some testing with the Center for Disease Control, have provided an online tool called Google FluTrends that can quickly and accurately detect flu outbreaks in different geographies.

    Traditionally, flu outbreaks have been identified by polling doctors at local clinics and hospitals. This data is then aggregated across the country and analyzed. If there is an unusually high number of people with flu-like symptoms in any one area, then there was probably a flu outbreak … one or two weeks ago. That might be nice to know, but it provides little solace for those of us who wind up suffering.

    Google’s novel approach is to monitor real-time search engine queries for people seeking cures for flu-like symptoms. When certain areas have a higher than usual number of queries, it’s likely the first sign of an outbreak. That means we can all wash our hands a little longer, take extra care when sneezing, and even avoid busy areas in hopes of dodging the flu this year.

    At Crimson Hexagon, we take a similar approach to brand monitoring.  If a product or PR campaign has gone wrong, there’s no sense in waiting until it shows up in your bottom line.  Instead, by monitoring the opinion of your brand online, you can act at the earliest signs of a problem. With this type of real-time, actionable information maybe both we and our brands can stay healthy this holiday season. And that’s something we can all celebrate.

    Spore “locks out” pirates, locks in negative reviews

    We’ve written before that summarizing opinion is complex, and that understanding the meaning behind product reviews can lend more insight than averaging the “star ratings.” I encountered a terrific example of needing this context tonight.

    In a situation that any working parents reading may relate to, by mid-December each year I’ve abandoned the fantasy of gifts purchased months in advance and am racing to online shopping destinations with quick shipping. My first stop tonight was for a video game for my son, a game that I remembered was pretty popular and well received. When I checked out Spore on Amazon, I was surprised to see a 1.5 star rating with over 3,100 reviews. Did I have the wrong game? I IMed a friend for advice.

    Turns out, the game itself is pretty cool. But the DRM (digital rights management) designed to prevent piracy is a ludicrous opt-in, leading gamers to review it negatively. Here’s a sample:

    First of all, the game incorporates a draconian DRM system that requires you to activate over the internet, and limits you to a grand total of 3 activations. If you reach that limit, then you’ll have to call EA in order to add one extra activation. That’s not as simple as it sounds, since when you reach that point EA will assume that you, the paying customer, are a filthy pirating thief.

    So, the bad reviews make sense: there’s a strong negative associated with the game but it’s not about the graphics or the gameplay, features that might matter to me (or the gamer). Getting to the why (without having to read an adequate sample of the thousands of reviews) was vital. Automating the process might have been even better,

    Oh, and the punchline? Apparently EA managed to deter potential customers, annoy their existing base, and still end up with Spore as the most pirated game of 2008.

    On complex opinions

    We’ve often noticed that people tend toward extremes when assigning online product ratings. For Amazon products with an theoretically average (3 star) rating, more than 65% of all ratings lump into either the best or worst score – a ‘bimodal distribution’ in stats-speak. There are many potential reasons: a lack of clear criteria for different ratings, a desire to influence the displayed score, shameless promotion, spite, and so on. Whatever the reason, the end result is that many rating systems are essentially a thumbs-up, thumbs-down proposition and often give misleading information.

    Fortunately, the lack of sophistication doesn’t carry over from quantitative ratings to text-based reviews. From what we’ve seen in the distribution of ratings, we might expect text-based reviews similarly to espouse straightforward points of view. Instead, most text-based product reviews actually contain ‘complex opinion’— writing with at least one observation contrary to the overall argument. (There are some notable exceptions to this trend.)

    As an example, in this Amazon listing for a Casio camera, 61% of the 46 English-language reviews mention both a positive and negative aspect of their experience. Considering the average review is only about four lines long, the amount of even-handedness is surprising. Other merchants, such as Best Buy, include designated fields in their review systems for Pros and Cons to encourage more considered opinions.

    We find posts with complex opinion to be particularly valuable for three reasons:

    • The poster has nuanced thinking, and therefore a relatively more insightful perspective
    • The contrary point is important enough to mention, despite the poster’s overall impression
    • The poster is engaged enough with the topic to write a complex post

    By looking at what draws fire from fans and compliments from critics we can track trends and derive insight into opinion far beyond simple positive/negative assessments. Interestingly, brands are starting to see value in allowing negative reviews as well, believing that permitting the negative lends an air of authenticity more credible than all-positive marketing speak.

    It’s been said that, “We learn our virtues from our friends who love us; our faults from the enemy who hates us.” We like it the other way around, too.

    Election eve

    Election fever is everywhere. Online, you can feed your fixation with an endless stream of polls, pro bloggers, YouTube videos, or Twitter election updates. And whatever the result tomorrow, the punditry has only just begun.

    Here at Crimson Hexagon we’ve been monitoring what blog RSS feeds are telling us about people’s perception of the candidates. We’ve searched and pulled down a whole bunch of feeds, determined categories based on those results, and then filtered the opinion to find: What does the blogosphere like least about John McCain / Barack Obama?

    Here’s what we found:

    McCain

    Click to Enlarge

    Click to Enlarge
    Obama

    Click to Enlarge

    Click to Enlarge

    When you start to track foibles over time — whether McCain’s or Obama’s, we noticed that campaign tactics are something our monitors flagged for the negative watch list.  We also can see how real-time events influenced people’s opinions: the launch of the Britney/Paris celebrity ad resulted in a big backlash against McCain’s negative campaign tactics that has only escalated. Obama’s much ballyhooed infomercial?  Generated a huge amount of discussion, 3x of any recent prior event for him, but the sentiments in those discussions were more of the same.

    The New York Times is declaring that technology and Web 2.0 capabilities were a palpable force in this campaign. We believe that we’re just beginning to see the impact of distilling the opinion from a wide range of voices.