Social Media Analysis Finds Unexpected Reasons for China's Censorship

Professor Gary King Uses ForSight™ to Examine Censorship in China

On August 22, 2014, Science published an article I wrote on Chinese Censorship with Jen Pan and Molly Roberts, two of my (incredibly talented) graduate students at Harvard. In that article, and another article published last spring in the American Political Science Review, we reported our discovery that the Chinese government’s enormous social media censorship program is not after what everyone, including us, had thought.
We all had thought that their goal was to stop criticism of the government, its leaders, and their policies. This turned out to be completely false. Remarkably, if you write a social media post in China that criticizes the government, it is no more likely to be censored than if you write one supporting the government. What we discovered instead is that they censor any effort at collective action – any attempt to move people, controlled by someone outside of government. If many people write that the political leaders of this town are stealing money, have mistresses, etc., that’s no problem, but if you say “and let’s go protest,” and what you say is connected to some real world event, your post will be censored. In fact, if you say the leaders of this other town are doing such a great job, we should all meet in the town square to have a rally for them, that post will also be censored. The Chinese government doesn’t care what you say about them; they only care if you have the power to move crowds, and if you do you will be stopped. The leaders of the China, it seems, have little to worry about the US and other countries, but – in some respect, like any country’s leaders – they worry the most about dissent from their own people.
We also had many other findings, much of which completely surprised us and apparently others, but instead of retelling that story, which you can find in our articles, I’ll explain how it is we came to study this topic in the first place, since that was also a surprise.
Chinese censorship of social media is the largest effort to selectively suppress human communication by any government in recorded history. How this enormous system works – with many tens of thousands of human censors reading and censoring posts by hand, operating within thousands of social media companies and governments at all levels – what its goals are, and what it accomplishes, are of enormous interest to the academic community, the people of China, and governments and commercial enterprises worldwide.

Chinese Censorship Decision Tree
The Chinese censorship decision tree: The pictures shown are examples of real (and typical) websites, along with our translations

However, when we started, Jen, Molly, and I had no intention of studying Chinese censorship or anything remotely close to it. Instead, we were developing new methods of automated text analysis, like many of the technologies that Crimson Hexagon originally licensed from Harvard to form the company and the many it has developed since. We developed some new methods and we decided to stress test them, to push them until they break so that we could discover how to improve them further. We thought that a good way to do this might be to try the methods in Chinese, a language for which our methods had not been designed and which obviously works very differently from English. Molly and Jen also happen to know Chinese. Our tentative plan was to publish a technical statistics paper about our new methods and perhaps one about how to do automated text analysis in Chinese.
So we downloaded a database of Chinese language social media posts from Crimson Hexagon’s data library and went to work. We learned a lot about our methods and began to improve them further. Then we decided that we should go back to the website from which Crimson Hexagon got the social media posts. We wanted to understand the posts in context and to verify that, for example, Crimson was pulling the text of the post rather than the ads.
Jen and Molly and I met in my office to talk about the results. They told me they thought that something may be wrong with the Crimson Hexagon database, since for some posts when they click on the URL, they got the same text of the social media post as in the database; for other posts, they didn’t get anything; and for still others, the website said something about the post being investigated.
A social media post being investigated? We looked at each other in amazement as we realized that there was nothing wrong with Crimson Hexagon’s data collection procedures. Instead, they are downloading posts from nearly 1,400 different social media sites in China before the Chinese government can read and decide which ones they would censor! Through Crimson Hexagon, we had access to the text of the entire corpus of censored Chinese social media posts! The Chinese people could not read these posts because they were removed from the web by government censors, but we could. Seemed like a good time to change the objectives of our research.
Oh, remember that nice study we were going to do on methods of automated text analysis in Chinese? We may get back to it one day.

To read more commentary on the article, check out these links:
Science Magazine:
Science Magazine’s Podcast:
BuzzFeed News:
Popular Mechanics:
The Wall Street Journal:
Motherboard for
Ars Technica:
South China Morning Post:

Request a Demo

Ready to transform your business?

Get a walkthrough of Crimson Hexagon and learn how consumer insights can help you make better business decisions.