Using Social Media Data to Understand Censorship in China

The explosive rise of social media in the last decade represents an unprecedented increase in the expressive capacity of the human race. Billions of people now have the ability to post an idea on the Web that billions of others can read, and billions now do post. However, since everyone is speaking at the same time, very few individuals wind up being heard.  What we need is more than a way to hear what any one person is saying or a search to find the needle in the haystack — but rather a way to hear what everyone is saying, to characterize the haystack, to understand the word on the street and the wisdom in the crowd.  This is where Crimson Hexagon aims and its unique technology excels: It gives voice to the people about every product, ad campaign, service, policy, candidate, government, action, event, and movement — in a way that can be heard and understood by anyone who wants to know.

Since we are in the unusual position of being able to hear what so many have to say, it should be no surprise that we are also keenly interested in the forces in the world attempting to stop anyone from hearing what others have to say.  Among these, the Chinese government looms large.  They control the most extensive effort to selectively censor human expression ever implemented.

So what is the Chinese government after?

What is the point of this program?  What do censorship practices indicate about the intentions and priorities of the Chinese government? These are the questions that my coauthors, Jennifer Pan and Molly Roberts, and I examine in our recent article, just published in the May 2013 issue of the American Political Science Review.   To answer these questions, we conducted the first large scale analysis of this enormous secretive program.  We downloaded millions of social media posts before the Chinese government could read and censor those they deemed inappropriate, and we then (carefully) went back to each post to see if it was censored.  We then applied Crimson Hexagon technology, and new analytical methods we developed here, to all these Chinese language posts.  We used this analysis to understand how the content of the censored posts — which the Chinese people were prevented from reading but which we had — differed from those that were not censored.

The Fractured Structure of the Chinese Social Media Landscape

What most everyone thought previously was that the Chinese government was censoring criticism of its government, leaders, and policies.  This seemed obvious, the presumed result being that the remaining posts were more positive about the state.  But as we learned, this received wisdom was completely wrong: Chinese social media includes millions of criticisms, including many vitriolic, of the Chinese state, the people who run it, and what they do.  Those posts are not more likely to be censored than those which applaud the state and its leaders.

So if they were not pruning social media to make themselves look better, why do they have tens of thousands of people reading and censoring individual social media posts?

It turns that the Chinese leadership lets criticism through, censors all expressions that display, or are perceived to display, “collective action potential” — any attempt to spur collective action, whether opposed to, in support of, or irrelevant to the state.  Want to write a social media post that explains that the leaders of your town are slimes, stealing money, and incompetent?  No problem.  But if you also say that we should all go protest as a result, you’ll be censored.  Indeed, if instead you say that the leaders are terrific people with terrific policies, and we should go have a big party for them, you’ll be censored.  Any attempt to direct the movement of people, protest, or other collectivity will be clipped in the bud and removed from the Internet.

Social Media Events with Highest and Lowest Censorship Magnitude

The Chinese censorship apparatus is huge but, like a giant elephant tiptoeing around trying to suppress information, it leaves big footprints.  Because we were the first to obtain an aerial viewpoint of their entire program, we could see the footprints.  This meant that in addition to understanding what they censor, we could also clearly see government intention to act outside the Internet ahead of time.  For example, several days in advance we could see that they were about to arrest a dissident, sign a peace treaty, or demote a political leader.

The results of our research would seem to provide critical and unique information for academic researchers, public policy professionals, government actors, and area experts on the coordinated actions of the Chinese government.  Many are following up.

As a founder and chairman of the board of Crimson Hexagon, and also a professor at Harvard, it is gratifying to see how often the sometimes divergent goals of academic researchers and industry experts can come together to produce new data, novel statistical methods, and genuine knowledge about the world that could never have emerged from work solely within a university or within any single company.  The synergies that result from working together to cross these borders are fun to see and productive to be part of.

Gary King

Written by

Gary King is the Albert J. Weatherhead III University Professor at Harvard University, Harvard’s most distinguished faculty position. He also serves as Director of the Institute for Quantitative Social Science. He is a member of the National Academy of Sciences, Fellow of the American Statistical Association, Fellow of the American Association for the Advancement of Science, and Fellow of the American Academy of Arts and Sciences, and has won “MORE” than 30 “best of” awards for his articles, books, conference papers, and software, among others.

English Japan France Usa Australia Slovakia