龙信明 BLOG     Global Edition              Shanghai Diary     Watching China

Carnegie-Mellon's Cold Fusion China Microblog Project

Censorship (or lack thereof) in China's Social Media


A commentary on "Censorship and Deletion Practices in Chinese Social Media" by David Bamman, Brendan O’Connor, and Noah A. Smith, of Carnegie-Mellon's Computer Science Grad Studies Dept.

Article by 龙信明;

  • Prologue


  • Students at the IT Grad Studies Department at Carnegie-Mellon University recently published a paper titled, "Censorship and deletion practices in Chinese social media", a paper which may be unique in not only delivering less than promised at the beginning, but less than claimed at the end.

    Our Three Musketeers were looking to prove a thesis already well-entrenched in the West about censorship in China, but appear to have come away more or less empty-handed. It might have been better had they gone away quietly rather than publish a paper with embarrassing conclusions.

    Rather than providing factual evidence of microblog censorship in China, the report appears to prove the opposite, and reads more like a cautionary tale about knowing what you're doing before you begin making claims (cf. cold fusion).

    The Three Musketeers of Carnegie-Mellon's Cold Fusion China Microblog Project: David Bamman (top left), Brendan O'Connor (bottom), and Noah Smith (right)
  • Ideology and Agenda


  • In this study, as unremarkable for intent as for usefulness, the authors begin with an ideology and an agenda, and end with a strong odor of people twisting facts to fit policy.

    In case we've forgotten, that's how the West invaded Iraq for its WMDs.

    This paper cannot but be seen in the context of publicized efforts of the US and its allies to effect global Internet hegemony and undermine nations' sovereignty over their own cyberspace.

    This is similar to the way that extraterritorialism was practiced by colonizing countries against their colonies in the past.

    It's also apparent that, despite its academic peer-reviewed, psuedo-objective and pseudo-scientific veneer, the paper is ideologically loaded and part and parcel of the US government's long term anti-socialist, anti-communist campaign to discredit and demonize Chinese leadership in the eyes of the Anglosphere.
  • The High Moral Ground and Academic Integrity


  • It would be remiss to refrain from pointing out that Carnegie-Mellon's Grad Studies Department - at least the Computer division of it - does a good imitation of being an ideological hotbed of China-hate.

    Randall Bryant, the Dean of the Carnegie-Mellon department that produced this landmark study of ideology-afflicted research.
    In fact, the place may be an incubator for this, which would lead one to believe that one or more influential staff members have corrupted the students and the department.

    Reading between the lines and considering the ideological closed-circuitry of the Department, the funding, the intricate relationships related to this study, one might almost surmise this grand institution serves as a training-ground for spooks.

    So far as we're aware, the high moral ground of academic research standards and the rigor of peer review were meant as defensive tools in the search for truth, not as aggressive weapons to piss all over people you don't like or who don't share your point of view.

    When one looks at the past work of the the authors, the journal itself, the references cited, and especially the funding, we see that we're basically in the hands of a US government / CIA / quasi-NGO / Western propagandist daisy chain, trying to 'catapult the propaganda' (to use G.W. Bush' famous quote) into the realm of "respectable science".

    The study is fundamentally a psuedo-scientific echo chamber of US government propaganda, and that entity's ideological symbiotes.

    There are clear scientific failings in the methodology. Using Twitter as an objective data source is clearly not acceptable.

    The Achilles heel of the paper is its origin in US government / US-centric funding, which also underlies a large part of the reference materials, and serves to render dubious the objectivity of the work.

    Further, it could also be hypothesized that this paper is a kind of probe of China's cyber-defence systems, and/or an analysis of the efficacy of US intelligence agencies' cyber-warfare against China.
  • The Political Overcast


  • The study, perhaps due to the pervasive colonialist spirit, failed to recognise that what China does within its own borders to ensure public stability, is in fact, nobody else's business - and certainly not that of Carnegie-Mellon or Mr. Bamman.

    "Twitter and Facebook were blocked in China in July 2009 after riots in the Western province of Xinjiang."

    Indeed, and that was because the source of much of the organisation of the terrorist riots were traced to Rebiya Khadeer's NED/CIA-financed World Uighur Congress in Washington. Sufficient evidence has been released to remove any doubt on this matter.

    The study ignores the very real and constant (and freely admitted) external meddling by the US in China's internal affairs. It pretends ignorance of the CIA's new "sock puppet" program of flooding the social internet media with fake persona (astroturfing), in attempts to stir dissention - as it did with its stillborn 2011 "Jasmine Revolution" in China. See footnote references.

    It ignores the very real reasons many countries institute censorship or blockage of sites like Twitter and Facebook, as the UK also recently did, to prevent those media from being the vehicle used to further destabilise dangerous public disturbances.

    It pretends ignorance of the fact that these external influences all emanate from the US, being responsible for the unrest in Tibet and XinJiang, to say nothing of all the "Color Revolutions" around the world, and the destabilisation of Egypt, Iraq, Libya, Tunisia, Yemen, Syria, Iran.

    It ignores the fact that the US, being the only perpetrator, doesn't suffer from this handicap.

  • The Language Problem


  • Bamman and his group clearly had great difficulties understanding the Chinese language. They recognised that single words change meaning when grouped, but had no way to deal with that effectively. Nor would they be aware of current slang.

    To emphasise this, go to Sina Weibo, extract some posts, and run them through Bing or Google translate. You will usually obtain only unintelligible garble, caused by the use of slang, idioms, the vernacular, abbreviated statements and word combinations.

    Bing and Google are effective with simple sentences and good grammar that avoid slang, idioms and expressions. Bannan had no way to deal with any of this, and therefore his examinations could have been nonsense in many, or even most, cases.

    Even with the ability to identify the so-called "suppressed political terms", "blocked terms", "politically sensitive terms", the study team's total lack of familiarity with the language would make it impossible for them to appreciate the interplay of all terms contained in a message.

    Only by actually reading all of the deleted messages could they make intelligent assumptions about deletion-prone content, but even then they had no ability to identify the persons making the deletions. One can be forgiven for concluding Bamman was well beyond his depth.

  • Another, and Better, Study


  • Three researchers from the HP Social Computing Lab in Palo Alto, recently published a similar study titled, "Trends in Chinese Social Media", which dissects comparisons between China's Sina Weibo and the US Twitter.

    This one appears more useful, and certainly more informative, and appears to have been done to a higher academic standard in that it is refreshingly free of any pre-ordained ideology.

    This one is research done to see what is there, rather than (as in Bamman's case at Carnegie-Mellon) looking for facts to prove something we already believe - and apparently unwilling to let go when we're proven wrong.

    Trends in Chinese Social Media - Louis, Yu, Sitaram Asur, Bernardo Huberman; Social Computing Lab; HP Labs; Palo Alto, California. Original Article (pdf).

  • Some major general criticisms:


  • Funded by the US government as well as a Cold War-originated NGO "science" organization specifically geared to US' global technological supremacy, not pure academic research ("ARCS").

  • "First Monday" Journal has a professed agenda of "Internet freedom", hardly an objective or scientific concept. Their concept of Internet Freedom is most likely one of the US government being free to use it to destabilise nations in preparation for regime change.

  • Hypotheses based on a set of data which is maintained by a documented collaborator with US government political destabilisation campaigns (Twitter).

  • Twitter data set would comprise a domain undoubtedly compromised by US government agencies' publicized policies regarding manipulation of social media, namely through fake personas / sock puppets / astroturfing.

  • Researchers' apparent lack of substantial Chinese language knowledge

  • Complete lack of feedback / input from Chinese government or engineers of alleged censorship

  • The reference materials for what is essentially an analysis of use or suppression of Chinese language within a particular environment are 100% English-sourced

  • Reference material contains numerous examples of US government / NED-funded anti-China propganda outlets, e.g. China Digital Times, Journal of Democracy, etc. making the paper a kind of closed ideological feedback loop.

  • The main source of references is corporate newspaper articles, virtually all of which have a demonstrable editorial bias against China. Overall, an apparent failure to use reference materials with a neutral or objective framework regarding Chinese Internet control.

  • Implicit criticism and politicization of Chinese internet policy, e.g. inclusion of a reference from "Human Rights Watch" which implies Chinese cyberspace is a human rights issue, and therefore helping to damn the subject of the study regardless of the findings.

  • Omission of documentation that the Chinese mainland public overwhelmingly supports Internet filtering / censorship, and supports the CPC government as the agency to handle that.

  • Apparently no lab work / research within PR China itself.

  • A Brief Description of a Failed Study


  • Bamman's main conclusion is that by comparing Twitter and Weibo messages, one might obtain a dynamic list of censored terms. But if Twitter merely mirrors Western propaganda and media bytes (as claimed), then differences would not necessarily show censorship, only differences in taste of Chinese and Westerners (i.e. Chinese in general don't buy Western propaganda).

    But even looking past that, the group made dozens of mini conclusions (disclaimers, in fact) throughout the paper, conclusions they don't emphasize in their final statements.

    For example, these disclaimers made it clear that:

  • The authors could not conclude that any above-average deletion rates were due to censorship
  • Many above-average deletions were unrelated to 'politically-sensitive' terms, and
  • Geographically-distinct deletions showed no strong correlation with 'politically-sensitive' terms

    However, the authors essentially ignored these disclaimers in their conclusions, on too many occasions cleverly leaving readers to surmise on their own that "deletion" was synonymous with "censorship". We find this approach distasteful and manipulative and, while I'm sorry to say this, bordering on outright dishonesty.

    1. By building tools that access APIs provided by Weibo, the researchers were able to store all messages posted on Weibo between 6/27 - 9/30. They were able to then track messages that were subsequently deleted.

    However because of API limits on the number of searches done for deleted messages, they can only query a small subset of messages to get a sense of what messages were deleted. From that, the researchers compiled a list of terms that seemed to be deleted above and beyond the norm.

    2. The researchers, through statistical analysis of a large sample of Weibo messages posted between 6/27 - 9/30 and subsequently deleted, were looking for signs of censorship.

    They found that on average some 16.25% of all messages have been deleted - most of which appear not to be correlated with any politically sensitive terms.

    The researchers could conclude only that the deletions were a result of spam deletions, personal deletions, or (perhaps) political censorship.

    3. The authors could not find any statistically relevant signs of censorship.

    Using a list of known censor terms, the researchers found that the messages containing those terms were deleted only at a slightly higher rate - 17.4% - than normal. Even for the top 17 terms that were deleted at a statistically higher rate, it is unclear whether they were the result of political censorship.

    4. For example, while messages containing terms such as "Fang Binxing," "Ministry of Truth," "Falun Gong," or "communist bandit" were deleted at close to 100%, only 19 such messages were deleted. This is hardly a significant number when you note that over 56 million messages were collected.

    5. When we get to terms that are found in a few hundred messages (still a minuscule sample), the deletion dropped to the 25-35%, much closer to the 16.25% background norm. Considering the small sample, and the fact that 16.25% of the messages are probably typically deleted for non-political reasons, there is no evidence here of censorship, pervasive or otherwise.

    6. Looking at a list of terms that are known to be blocked by the GFC and Sina's search interface, the authors found (and appeared quite shocked to find) that messages containing these known blacklisted terms were actually not deleted more often (in any statistically significant way) than messages lacking these terms.

    7. Hypothesizing that perhaps the censors were looking only to block the influential messages, the authors failed to find any statistically significant number of deletions targeted against messages that were reposted more often or whose authors had more followers.

    8. The authors then hypothesized that perhaps the censors were actually reading the messages and were deleting only the unfavorable messages. They manually checked a few messages that contained Ai Weiwei's name and found that the deletions appeared to be random.

    9. The authors finally hypothesized that perhaps the censors operated by targeting messages based on region of origin.

    10. However, and this is one place that ideology clearly raised its head, the authors admitted that this personal information provided when registering an account could easily be entirely false, but then proceeded to claim they could still identify places of origin.

    11. Looking at the originating region of the posts, the authors found that that messages from "outlying" provinces were deleted at a much higher rate than in regions such as Beijing.

    However, even the author cautioned against jumping to the conclusion "that higher rates of deletion in these areas may be connected with their histories of unrest (especially in Tibet, Qinghai, Gansu, and Xinjiang)."

    Censorship is not a sensible conclusion because the deletion rates do not change in any statistically relevant fashion for politically sensitive vs. non-politically sensitive terms.

    12. Thus in Tibet for example, while 53% of all the messages observed were deleted, only 50% of the messages that contained politically sensitive terms were deleted. The authors speculated that perhaps outlying regions might get more deletions because there are fewer messages originating there.

    13. If censor resources are distributed equally across provinces, the provinces with fewer messages might receive more attention. Again, if this were so, we would expect the censorship of politically sensitive terms to be much higher than none in Tibetan messages, which is not observed.

    14. In grasping for an explanation, the authors compiled a list of distinctive terms that are unique to each province and hoped to find hyper-sensitive terms that are associated with "outlying" regions.

    Unfortunately for the authors, most unique terms relate to local locales, with politically sensitive terms correlating only weakly with each region.

    15. My conclusion: the authors for the most part thus came away empty handed. Chinese Weibo is relatively free of censorship. It allows the circulation of messages that contain even terms known to be blacklisted on Sina search and the GFW.

    16. The authors recognised differences between Weibo and Twitter messages, and proposed to use this as a gauge of Chinese censorship. But this is hardly scientific; this could merely reflect differences in the politcal outlook between Chinese people and American media institutions.

    Additional Editorial Note:

    During our examination of Bamman's study and the writing of our critique we had a nagging feeling that something was not quite right. Upon reflection, that feeling is now stronger and more focused.

    There is something about this entire study that appears odd, in the claims made, in the kinds of data reported, some of the 'proofs' offered, all relating to the choices of apparent supporting evidence.

    Our conclusion is that there is a likelihood the authors 'cherry-picked' the data - selecting 10 or 20 'politically sensitive' terms that purport to show high deletion rates, but in fact chosen from potentially thousands of terms that may have had much higher deletion rates but were not 'politically sensitive' and therefore useless as proof of the author's ideological agenda.

    If this is true, and our suspicion is strong because this would explain many anomalies, it would mean the authors went through their list of deleted terms and chose those that would support their apparent intent to "prove" censorship in China even if their own facts did not support this conclusion.

    Again, if true, such an ethical breach should result in the invalidation of the study, the retraction of its publication, and a letter of apology to China from CMU.

    This is a sufficiently serious issue that CMU should release the data, making it available for independent examination. This is the only way to dispel such a serious concern and, if the authors have nothing to hide, there should be no objection to this.


  • Methodological Errors, Unjustified Conclusions and More


  • The study focused on the deletions of Sina Weibo posts, on the rates of such deletions, and on the content of deleted posts, in an attempt to prove that Sina Weibo operates under a substantial cloud of censorship.

    ". . . we present here the first large–scale analysis of political content censorship in social media, i.e., the active deletion of messages published by individuals."

    Actually, and while we didn't intend it, what we really present here the first large-scale analysis of the active deletion of messages by the same people who posted them. The relevance of these results to our obtaining a Ph.D. in Computer Science should be obvious.

    ". . . we are interested in politically–driven message deletions."

    This may indeed be the curiosity of the study sponsors, but it would be apparent to any 10-year-old that if you don't know who deleted the posts and have no contact with those doing the deletions, it is impossible to determine motivation. You then have no empirical argument, but only wishful thinking.

    Again, whether from hubris or ideology, the study was done in apparent ignorance of, and isolation from, the subject being studied. Motivation appears to be attributed based on prior bias and expectation.

    A statement of the main thesis of the study: ". . .the stream of information from Chinese domestic social media provides a case study of social media behavior under the influence of active censorship."

    They did study the "social media stream" but clearly failed to understand its behavior and discovered little to no evidence of "active censorship". The above grand claim is pathetic nonsense.

    It seems to me that what Carnegie-Mellon and Bamman appear to actually provide, is a case study in the infiltration of a university grad studies department by what would be the military-intelligence sections of the US government, and the subsequent corruption of grad students by American imperialist ideology.

    "We focus here on leveraging a variety of information sources to discover and then characterize censorship and deletion practices in Chinese social media. By examining the deletion rates of specific messages by real people, we can see censorship in action."

    Unless, of course, the deletions are done by the posters themselves. In which case, we see nothing in action except people emptying their out-boxes. Ideology already rears its ugly head; Bamman is equating message deletion with government censorship. More later.

    And again, if you don't know who deleted the posts (and Bamman doesn't), or why they were deleted (and he doesn't), you have nothing.

    "Reports of message deletion on Sina Weibo come both from individuals commenting on their own messages (and accounts) disappearing (Kristof, 2011a), and from allegedly leaked memos from the Chinese government instructing media to remove all content relating to some specific keyword or event (e.g., the Wenzhou train crash) (China Digital Times, 2011)."

    The above comments are shameful. I would characterise them as misleading, dishonest, intended only to decieve. A perfect example of academic research corrupted by ideology and immaturity.

    We do not have a "sample" of Chinese individuals commenting on their messages and accounts disappearing. We have only the juvenile and provocative posts made by Nicholas Kristoff - "delete my post if you dare". Surely the epitome of mature academic research.

    The reference to the "allegedly-leaked memos from the Chinese government" - as reported by the China Digital Times - is especially disingenuous.

    Readers may be interested to know that the China Digital Times is a propaganda website in Hong Kong, financed by the NED/CIA and, like its VOA sibling, has the encouragement of sedition of China's population as its main focus.

    This so-called "news site" has been implicated in more than one supposed leakage that proved to be a fabrication. It was also reported as instrumental in the US-sponsored (but stillborn) 2011 "Jasmine Revolution" in China.

    "If we look at a list of terms that have been previously shown to be blacklisted (with respect to the GFC), we see that many of those terms are freely used in messages on Sina Weibo, and in fact still can be seen there at this writing.

    Table 3 lists a sample of terms from Crandall, et al. (2007) that appear in over 100 messages in our sample and are not deleted at statistically higher rates. Many of these terms cannot be searched for via Sina’s interface, but frequently appear in actual messages.

    The existence of such (undeleted, sensitive-topic) messages may suggest a random component to deletion (e.g., due to sampling); but here again, we cannot establish an explanation for why some messages containing politically sensitive terms are deleted and others are not."


    Of course you can suggest an explanation. The obvious one is that the Chinese government does NOT automatically delete messages containing terms that Carnegie-Mellon decides are "politically sensitive".

    Again, stubborn ideology corrupting research. Bamman and his group appear unable to admit that posters may simply have deleted their own messages for personal reasons.

    "The 136 terms from the Twitter/Sina comparative LLR list that are blocked on Sina’s search interface are inherently politically sensitive by virtue of being blocked."

    There is no evidence to support this statement. It may indeed be true in some cases, but blocked terms equally might qualify on grounds other than "political sensitivity". Being ignorant on Chinese culture and uninformed on rationale, the study ignores this.

    The study also mangles the definition of "political sensitivity". According to Bamman, "iodized table salt" was blocked, leaving us to conclude this is a "politically-sensitive" term. I personally doubt it would qualify as such.

    Also, the rumors about Jiang Zemin's illness and death would hardly be in this category. Though the government might well want to kill false rumors, it would be a stretch to attribute these to political sensitivity.

    "With a set of known politically sensitive terms discovered through the process above, we can now filter those results and characterize the deletion of messages on Sina Weibo that are due not to spam, but to the presence of known politically sensitive terms within them."

    Again, this is not true. Deletions, by whatever entity, are not a two-dimensional matrix. The study appears to confuse its ideology-based assumptions with real evidence.

    There is no evidence presented that Bamman's terms were the cause of any government-sponsored deletion, if in fact any such deletions occurred. Posts can be deleted for a host of reasons and by different people.

    As an obvious counter-indication, countless millions of Weibo posts contain photo or video attachments which Bamman, being apparently ignorant of his subject, did not examine. These may have prompted deletion either by Sina or the posters themselves.

    Furthermore, China, unlike the US, has laws against obscenity, which might indeed prompt deletion by Sina, unrelated to other text content. Since Bamman was almost certainly ignorant on this point, and couldn't read Chinese in any case, this wasn't considered.

    "In the absence of external corroborating evidence (such as reports of the Chinese government actively suppressing salt rumors, as above), these results can only be suggestive, since we can never be certain that a deletion is due to the act of a censor rather than other reasons.

    In addition to terms commonly associated with spam, some terms appear frequently in cases of clear personal deletions".


    If Bamman and his group had not been wearing their "China is evil" cloaks, they might have realised this beforehand and performed a more useful study on how effectively the CIA sock puppets infect Twitter and US cyberspace generally.

    This latter would be especially useful in gauging the efficacy of the current US media barrage in attracting public support for World War III which the US wants to begin soon in Iran.

  • Comparison of Sina Weibo with Twitter


  • There are vast differences between China and the West, and between Weibo and Twitter, virtually all of which appear to have escaped the authors of the study who can neither read Chinese nor have any appreciation of Chinese society.

    We mentioned photo and video attachments, and cultural differences, but there is much more. See the reference below to the HP study.

    Weibo, unlike Twitter, is tailored to Chinese users and is far more expressive, with its embedded emoticons, photos, video and lyrics. As well, while both platforms have a limit of 140 characters, one character is one word in Chinese - and therefore it is possible to communicate much more on Sina Weibo.

    In the early days of the internet, a favorite webmaster trick was to collect the most common search terms or popular topics, and insert these onto their web pages as "keywords" - regardless of the actual purpose of the website or the page content.

    The process was so widespread that Google began disregarding the keywords altogether (and mostly still does), drawing it's content conclusions from the page text.

    In China, companies were quick to recognise the marketing value of Weibo, hiring thousands of young people to open Weibo accounts and create chatty posts in which they happen to mention this brand of cosmetics or that mobile phone. To attract readers from searches, and re-posts, they would include some text relating to the most popular topics of the day.

    Countless Weibo posts contain so-called "sensitive" words with completely unrelated message content. A post that is really a paid commercial announcement will include a brief, sometimes provocative, comment about Guo Mei Mei, simply gaming the system to get product names and so-called 'consumer reports' out into cyberspace. Self-deletions or spam deletions would naturally follow.

  • The HP Study Again


  • http://www.hpl.hp.com/research/scl/papers/chinatrends/china_trends.pdf

    It included this observation: "We observe that there are vast differences between the content that is shared on Sina Weibo when compared to Twitter.

    To study the dynamics of trends in social media, we have conducted a comprehensive study on trending topics on Twitter. ... we found that the content that trended was largely news from traditional media sources, which are then amplified by repeated retweets on Twitter to generate trends. Chinese tweets on twitter may follow that trend.

    In China, people tend to use Sina Weibo to share jokes, images and videos and a significantly large percentage of posts are reposts. The trends that are formed are almost entirely due to the repeated reposts of such media content.

    This is contrary to what we observe on Twitter, where the trending topics have more to do with current events and the extent of retweets is not as large. We also observe that there are more uneverified accounts among the top 100 trend-setters on Sina Weibo than on Twitter and most of the unverified accounts feature discussion forums for user-contributed jokes, images and videos."


    So if Weibo is more social than political, using the differences between Weibo and Twitter messages to make any conclusions on political censorship is a nonstarter.

    Bamman and his group appear totally ignorant of the social differences between China and the US. The Chinese are apolitical; they do not believe that politics (government, actually) is a kind of "team sport" where everyone can play. Culture and traditions are as different as night and day, certainly affecting the content and disposition of these micro-communications.

  • The Case of Jiang Zemin - Rumors of Death Greatly Exaggerated


  • "Jiang Zemin again provides a focal point: a trend analysis reveals a dramatic increase in the frequency of mention of Jiang’s name on Twitter and a much smaller increase on Sina.

    At the height on 6 July, Jiang’s name appeared on Twitter with a relative document frequency of 0.013, or once every 75 messages, two orders of magnitude more frequently than Sina (once every 5,666 messages).

    Twitter is clearly on the leading edge of these rumors, with reports about Jiang’s declining health appearing most recently on 27 June and the first rumors of his death appearing on 29 June.

    We note the same pattern emerging with other terms that have historically been reported to be sensitive, including 艾未未 (Ai Weiwei) and 刘晓波 (Liu Xiaobo)."


    This is to be expected, since these rumors or event reports originated in the US rather than China. If it is true as claimed that Twitter users post comments related to current and popular US news stories, we would expect a much higher frequency in the US.

    Since the (for sure "politically-motivated") rumor about Jiang's death originated in the US, was celebrated in the major media and flogged on Twitter, by what stretch of reasoning would we attribute to censorship the absence of its appearance in another country?

    Given the miraculous timing of the rumor and the total lack of any rational reason for such a rumor to appear - especially to originate in the US - the most likely hypothesis is that the story was a plant orchestrated by the CIA who, filled with deep regrets for missing the Wenzhou train crash, created this convenient contribution to Bamman's "study".

    "Messages can of course be deleted for a range of reasons, and by different actors: social media sites, Twitter included, routinely delete messages when policing spam; and users themselves delete their own messages and accounts for their own personal reasons.

    But given the abnormal pattern exhibited by Jiang Zemin we hypothesize that there exists a set of terms that, given their political polarity, will lead to a relatively higher rate of deletion for all messages that contain them."


    When we consider that unknown American persons fabricated and propagated a false rumor, we see no abnormal pattern whatever. What happened was what we would expect to happen, both in the US (propagation and flogging) and in China (kill false rumor).

    Bamman seems to be using this case as one of his prime examples of China's censorship, but the progression appears to us as the mature response of an adult (China) to the stupidity of a juvenile (USA).

    And as an admittedly snotty aside, what exactly is the purpose of the inclusion of a few Chinese characters in the study? To impress the peasantry? Appear scholarly? Be cute? Pretend the authors understand Chinese? 愚蠢的美国学生.

  • Liu Xiaobo and Ai Weiwei


  • "This suggests a hypothesis: whatever the source of the difference, the objective discrepancy in term frequencies between Twitter and the domestic social media sites may be a productive source of information for automatically identifying which terms are politically sensitive in contemporary discourse online."

    Not true. The frequency discrepancy can easily be explained by a lack of interest in a name or a topic; it is unnecessary (and biased) to resort to an assumption of political sensitivity.

    More than this, the question arises as to why a computer science student in the US would have any interest in identifying terms which might be politically sensitive in another country. Spook training? What exactly is the funding path for this study?

    Persons like Ai Weiwei and Liu Xiaobo are almost totally unknown in China, and few people have either interest in, or respect for, these US-sponsored "dissidents". It is only because the US government flooded the media with these stories that they were amplified on Twitter.

    Bamman and his group ignored the documented evidence that Liu Xiaobo was the President of the Independent Pen Center in NYC, an NGO funded by the CIA through the NED. He lived in Beijing with a high lifestyle on the CIA payroll for about 15 years - reliably reported at US$35,000 p.a. to encourage his seditious rantings.

    When he had finally gone too far and got himself tossed into prison, the US acted immediately to get him the Nobel Peace Prize - and then abandoned him to his 15-year imprisonment fate. He had served his purpose, as they all do.

    In fact, the US government is on record as stating it had been looking for about two years to find a suitable Chinese dissident to whom to award the Nobel Peace Prize as a way to discredit China.

    The US encourages and finances so-called "dissidents" in many countries. Ai Weiwei is in the same category, having received a permanent US Green Card and other benefits in return for a program of slandering his home country.

    These "dissident stories" can almost always be traced to the CIA or the US State Department, and are intended to further demonise China and attempt to create pressure on China's government.

    Bamman and his group either knew, or should have known, these facts.

  • Geographic distribution


  • " . . . messages on Sina Weibo are attended with a range of metadata features, including free–text categories for user name and location and fixed-vocabulary categories for gender, country, province, and city.

    While users are free to enter any information they like here, true or false, this information can in the aggregate enable us to observe large–scale geographic trends both in the overall message pattern and in rates of deletion."


    Just so we're clear, users can register totally false personal information as to location, but "in the aggregate" Bamman can still observe true geographic trends.

    "Messages that self–identify as originating from the outlying provinces of Tibet, Qinghai, and Ningxia are deleted at phenomenal rates: up to 53 percent of all messages originating from Tibet are deleted . . .

    We might suspect that higher rates of deletion in these areas may be connected with their histories of unrest but there are several possible alternative explanations.

    Ever the cynic, my first thought is that if I were the CIA and had contacts in Tibet and XinJiang (which they obviously do), it would be quite easy for me to flood Weibo with posts from those locations that the Chinese government would damned well want to censor. And I might have even more fun if my contacts were not in those areas but claimed to be.

    In any case, NingXia, QingHai and Gansu have no history of unrest, as the authors are surely aware.

    "An alternative explanation is that users themselves are self–censoring at higher rates."

    The above double-talk suggests Bamman and his group are so overcome by a foolish ideology as to render this study worthless. They appear determined to equate deletion with censorship, so that if I delete messages from my "sent" box, I am "censoring" myself.

    If this is true, my Outlook Express must be having a fit.

    Since there is no evidence that either Sina or the Chinese government are involved, why is self-deletion only an "alternative explanation" at the end, instead of the main one? Right-Wing ideology driving research and attempting to fit facts to the policy.

    The areas mentioned as being subject to higher message deletion rates are poor areas containing comparatively few high-powered computers, and having slower internet connections. Deletion rates could be entirely technology-based.

    With Macau being a separate entity in a sense, why would its deletion rate be so apparently high? Why would Yunnan, containing 46 of China's 56 ethnic minority groups, have a comparatively low deletion rate? Why would Sichuan, Shaanxi and Tianjin have lower deletion rates than those posts originating from outside China?

    Why would posts originating in Taiwan be in the top ten for deletion rates? According to Bamman and his group, this is a sure sign of government censorship. Why didn't the authors or the CIA care to investigate Taiwan in this light?

  • The Authors' Conclusion


  • "In this large–scale analysis of deletion practices in Chinese social media, we showed that what has been suggested anecdotally by individual reports is also true on a large scale: there exists a certain set of terms whose presence in a message leads to a higher likelihood for that message’s deletion."

    This seems an unworthy statement. The study's data do not at all appear to perform as stated, but Bamman appears to use the word "deletion" as synonymous with "censorship", when his own study concludes the posters themselves may well have deleted their posts.

    The myriad other terms in the messages could each or in combination be responsible for deletion, or may carry no responsibility. If, and this is a big if, the posts were deleted by Sina or a government branch, the study produced no hard evidence that these so-called "forbidden terms" carried any responsibility, noting that deletion rates were not substantially different with our without these terms.

    Besides potential censorship, there are other factors that might explain differences between those - local interests, ideologies, cultural sensitivities, demographics, institutional interferences, etc.

    But more importantly, the deletions needn't be related to terms at all. Some people see high numbers of posts as a badge of honor; others delete anything not personally valuable or historically significant.

    Many Weibo posts are of the order of "It's raining again" or "I don't know which dress to wear today". These trivia are often deleted, as are many other kinds.

    It's unfortunate the authors searched only for terms that were of interest to the CIA rather than to the Chinese. If they had been awake, they might have discovered countless millions of references to things like, "Did you see the fake photo of the Nepalese military that CNN said was a photo of Chinese policemen beating civilians"?

    "While a direct analysis of term deletion rates over all messages reveals a mix of spam, politically sensitive terms, and terms whose sensitivity is shaped by current events, a comparative analysis of term frequencies on Twitter vs. Sina provides a method for identifying suppressed political terms . . ."

    This conclusion is nonsense just on the face of it. Only a few of the dozens of potential variables have even been identified, much less examined, and there is no evidence that the deletion frequencies are related to "suppressed political terms". They could just as easily be related to social preference or even vulgarity as to politics.

    Further, (with the notable exception of Nicholas Kristoff's juvenile posts) there is no data, no evidence or documentation, to support deletion by other than the original poster. With this, any claim of "political suppression" is supported only by stubbornness.

    "More importantly, by being blocked on Sina’s search interface, these terms are confirmed to be politically sensitive."

    The statement is clearly untrue, as the report itself admitted. Many terms can be blocked without being politically sensitive; iodized salt, false rumors of the death of a famous person.

  • Never attribute to malice that which can be adequately explained by stupidity


  • "By revealing the variation that occurs in censorship both in response to current events and in different geographical areas, this work has the potential to actively monitor the state of social media censorship in China as it dynamically changes over time."

    No, it does not. For one thing, message deletion by parties unknown - but admittedly by the senders themselves - has suddenly morphed into "social media censorship" in China.

    Given that many so-called "suppressed words" and "politically sensitive" terms have wildly varying deletion rates, the facts do not support any claim of censorship.

    The study appears only to have informed us of the deletion rates of microblog posts in China - we can scarcely imagine a more useful contribution to the sum of knowledge.

    In the study's own words, "In the absence of external corroborating evidence (such as reports of the Chinese government actively suppressing salt rumors, as above), these results can only be suggestive, since we can never be certain that a deletion is due to the act of a censor rather than other reasons."

    "Here again, we cannot establish an explanation for why some messages containing politically sensitive terms are deleted and others are not."


    Well, maybe I can help, but this will be a difficult concept for Mr. Bamman to grasp and he may have to think for a while:

    Maybe the posters deleted those messages they didn't care to retain, and kept the rest. And maybe that's the whole story.

    Maybe, instead of looking for bugs to squash, Mr. Bamman and his friends could have just come to China and asked people why they deleted their messages. That would almost certainly have been less costly than all the computer time and data manipulation. And they would have spoken to real live people and obtained facts from the source. And had some great food.

    But that is the problem with ideology; it blinds you to the obvious; it turns neutral people into enemies; it prevents you from seeing things as they are.

  • A Question for the Author


  • Mr. Bamman: If I cared to learn about microblog message deletion in another country - say the US or France - and asked your advice; what would you tell me?

    Most likely you would suggest I just go there and talk to people, ask questions, learn and understand. Why didn't that methodology occur to you with this China study?

    Because China is "evil". It's "communist". It's "a dictatorship". Everyone would lie to you; people would be afraid to talk to you. You would be arrested and tortured, just like in Guantanamo Bay, although we know that Americans never torture anyone.

    And if you believe all of that, you really need to grow up.

    As a further point on "Censorship in China", the website you are now visiting contains thousands of articles dealing with China, politics, foreign affairs, US Imperialism and so on.

    It contains many articles in both English and Chinese on all the "forbidden topics" (at least, according to you): Tibet, Tiananmen Square, Freedom, Democracy, Human Rights, Corruption, Ai Weiwei et al, and even censorship itself.

    Yet this website has always been accessible in China. In light of your extensive experience in researching China's active censorship policy, can you explain this?

  • Our Conclusion


  • All things considered, a shoddy and pointless piece of work. A politically and ideologically-loaded endeavor conducted in isolation and examined with apparent bias, producing essentially useless - and possibly deliberately libellous and slanderous - conclusions.

    The most likely and encompassing explanation is that neither Carnegie-Mellon's IT Grad Studies Department, nor Mr. Bamman and his group, have the slightest clue about China's culture, traditions, values or methods of dealing with issues generally.

    The vast differences in society, laws, culture and traditions, attitudes and interests, prevailing points of view, between the US and China were either unknown or ignored. And it is in this atmosphere of all-too-apparent cultural ignorance that these parties have presumed to "research" China.

    Bamman and his group could have known this, had they only bothered to ask. But when you're a child in America and infected with the thrill of "spying on an enemy", it's a foregone conclusion that much of what you do will be merely stupid.

    A well-known US-based columnist recently wrote that Americans get their religion from the same place they get everything else - from their ignorance and simple-mindedness. We may now safely speculate whether they obtain their education from the same source.

    It shouldn't be necessary to say this, but pervasive jingoism and simple-mindedness have a way of obscuring truth.

    If I form opinions, suspicions, expectations, about a topic, but then research refuses to support expectation, I am forced to conclude that my original assumptions were incorrect and I must be badly misunderstanding what I am seeing.

    But if, after these results, I persist in holding to my original assumptions, it would seem I am stubbornly driven by ideology or racism, or worse.

    Whatever are the stated conclusions in this report, I hold a strong suspicion that the authors have not at all altered their original assumptions or beliefs.

    "(The authors) thank the anonymous reviewers for helpful comments and Y.C. Sim and Justin Chiu for helpful advice about the Chinese language. The research reported in this article was supported by U.S. National Science Foundation grants IIS–0915187 and CAREER IIS–1054319 to N.A.S. and by an ARCS scholarship to D.B."

    This may be too obvious to mention, but it seems likely that Mr. Y.C. Sim and Mr. Justin Chiu were rather less helpful than presumed.

    It's unfortunate these gentlemen are such strangers in their own land that they couldn't have brought their young students to China to do their research in a spirit of friendship and cooperation.

    It would have produced a useful and accurate report, and contributed significantly to the students' education. Instead, it appears to have produced a poisoned ideology and created enemies where none existed before. My compliments on at least that one success.

    In the past, I had held Carnegie-Mellon in rather high regard. But this study and other recent events have served to badly tarnish that image.

    I would go so far now as to state that I will make a point of broadcasting - fittingly, on my Sina Weibo account - my assessment of this institution as no longer being a suitable place for Chinese students to spend huge amounts of money in the hope of furthering their education.

  • Epilogue


  • Westerners stubbornly refuse to accept that China is a different place, with different customs, culture and values. And Americans particularly, in the best Judeo-Christian spirit, cannot abide without condemnation any culture or tradition that differs from theirs.

    It so often occurs that we see something, then proceed to interpret it in light of what it would mean if it happened in our country and our culture. And of course we misread the event, misinterpret the meaning, then draw incorrect and sometimes senseless conclusions.

    For those readers not yet fatally infected by Carnegie-Mellon's high academic standards:

    On a daily basis, in everything of consequence that affects daily life, China is just as free as the US, and in countless small ways more so. And as added features, China has no tipping, no parking meters, they don't sexually molest you at the airport, and China doesn't have millions of people living in tent cities or in their little cardboard boxes under the overpass.

    As well, 25% of China's population is not living on food stamps, and 50% of China's people don't have mortages below or near the waterline.

    Readers must surely have taken note of the Washington Post's recent "Top Secret America" series where they documented that ISPs and similar corporations intercept, scan and forward to the CIA/FBI/NSA more than 1.7 billion messages daily - emails, SMS messages, mobile phone calls, Skype calls, Twitter and Facebook posts . . . all perpetually stored.

    Given a choice, I think many of us would rather be censored in China than spied on in America.


    The Byron Spice "News Report":


    We have an additional feature for you, as a fitting end to this sorry tale - the "News" report by Byron Spice, who is the Director of Media Relations for the CMU School of Computer Science.

    His article, reporting on this Bamman study, reads like something Kristof or the NYT editorial staff would have written - a bit too happy to have outed China - combined with a curious attitude toward factual accuracy.

    Mr. Spice's news commentary lends much credence to our conviction that the entire CMU computer studies department is infected with this repugnant China-hate ideology which might exist throughout the institution.

    That's not comforting. Students in China hold many US universities in high regard, and CMU is has been of these. It would seem that regard was undeserved.

    As stated earlier, I will now make a point of broadcasting - fittingly, on my Sina Weibo account - my assessment of this institution, with its blind, and possibly racist, ideology, as no longer being a suitable place for Chinese students to spend huge amounts of money in the hope of furthering their education.

    Anyway, it seems appropriate to point out to readers some of the more noteworthy factual errors on Mr. Spice's News page:

    "the researchers found that oft-censored terms included well-known hot buttons . . ."

    No. What they found was that certain messages were deleted, but they found no statistical evidence that messages containing these were deleted more often than any other.

    Mr. Spice is also incorrect in attributing deletion to censorship, and he must surely be aware of that since the authors themselves stated it was just as likely the messages were deleted by those who posted them.

    "The CMU study also showed high rates of weibo censorship in certain provinces. The phenomenon was particularly notable in Tibet, a hotbed of political unrest, where up to 53 percent of locally generated microblogs were deleted."

    Again, we can see how tricky Mr. Spice is with his conclusions. Yes, 53% were deleted, but what the study found was a high rate of deletions totally unrelated to "sensitive topics" and no attribution to censorship.

    We would have to conclude that Mr. Spice is too clever by half, and either (1) really careless (2) blind, or (3) just dishonest.

    Sorry Byron, but if you read the report you would have seen that "sensitive topics" in Tibet were deleted at a lower rate than the non-sensitive ones.

    As to Spice's offensive comment about the "hotbed of unrest": that is in large part due to foreign agitators spreading their messages on the Web through Twitter. That's also why it was blocked in the UK recently - as I'm sure he knows.

    "The so-called Great Firewall of China, which prevents Chinese residents from accessing foreign websites such as Google"

    Untrue. Google has always been accessible in China, and is now.

    "anecdotal evidence is overwhelming that certain messages are targeted."

    Anecdotal evidence may be whatever it wants, but this purports to be a "scientific study" and, by the authors' own admission, all they found was deletion with no evidence to prove censorship. Perhaps Mr. Spice should have read the study before reporting on it.

    "Nicholas Kristof opened an account on a Chinese microblog site; within an hour of sending a message about Falun Gong, his account was shut down."

    True, but it was the study's only example, and if Mr. Spice wanted to be perfectly honest he would have pointed out that Kristof posted juvenile, smart-assed, provocative, flame-baiting messages directed at the Chinese government, including one that said, "Delete this if you dare."

    "Sina Weibo, a domestic Chinese microblog site . . . that has more than 200 million users."

    Mr. Spice should be made aware that Weibo has more than 300 million users.

    "If a weibo was deleted, Sina would return what the researchers came to regard as an ominous message: "target weibo does not exist.""

    The man may be over-reacting, and misleading his readers here. Deletion for any reason, including by the original poster, produces the same message. That's about as ominous as a busy signal.

    "Jiang's name appeared in one out of every 75 tweets, but just one out of every 5,666 messages on Sina Weibo - another indication that the Jiang conversations on Sina Weibo are suppressed."

    The rumors originated in the US and received all their play in the US. The absence of them in China indicated neither censorship nor suppression. Lack of knowledge of the rumor, and lack of interest in false rumors is a better explanation than Mr. Spice's accusation of 'suppression'.

    "Not all deletions are necessarily state-instigated censorship, the researchers noted. Spam and pornographic messages also are subject to deletion. ."

    True, but when they noted it, they further stated that the deletions could easily have been made by the original posters, unrelated to censorship, spam, pornography or anything else.

    In Mr. Spice's understandable eagerness to inform, he seems to have missed this point and continued breathlessly to suggest 'excusable' censorship due to spam or pornography - but still censorship. That was not the point the researchers made, and Mr. Spice would surely be aware of that.

    All in all, the dedication of Mr. Spice to accuracy and truth would seem to be cut from the same cloth as those who did the research in question.

  • Our Assessment of Mr. Spice's Jubilant "News" Release:


  • All things considered, a shoddy and pointless piece of work. A politically and ideologically-loaded endeavor conducted in isolation and examined with apparent bias, producing essentially useless - and possibly deliberately libellous and slanderous - statements.

    The most likely and encompassing explanation is that Mr. Spice is terminally infected by the same racist China-hate ideology as the rest of the department, and that he too hasn't the slightest clue about China's culture, traditions, values or methods of dealing with issues generally.

  • About the authors


  • David Bamman is a Ph.D. student in the Language Technologies Institute, School of Computer Science, Carnegie Mellon University.
    Web: http://www.cs.cmu.edu/~dbamman
    E–mail: dbamman@cs.cmu.edu

    Brendan O’Connor is a Ph.D. student in the Machine Learning Department, School of Computer Science, Carnegie Mellon University.
    Web: http://brenocon.com
    E–mail: brenocon@cs.cmu.edu

    Noah A. Smith is the Finmeccanica Associate Professor in the Language Technologies Institute and Machine Learning Department, School of Computer Science, Carnegie Mellon University.
    Web: http://www.cs.cmu.edu/~nasmith
    E–mail: nasmith@cs.cmu.edu

  • Some Related References


  • Trends in Chinese Social Media- HP Study Original Article (pdf).

    Why Governments Are Terrified of Social Media: Social Media in the Crosshairs in the USA Original Article.

    US Spy Operation that Manipulates Social Media: Military Software Creates Fake Online Identities to Spread Pro-American Propaganda Original Article.

    Social Media and US Deception: USAF Creates Army of Fake Virtual People Original Article.

    The CIA and Propaganda: CIA Funding and Manipulation of US News Media Original Article.

    The Victory of Spin: US Government's Manipulation of News Original Article.

    Hillary Clinton - Internet Protector: Breathless Hypocrisy from the US State Dept Original Article.

    Ai Weiwei - "China's Conscience": And Another Dissident Bites the Dust Original Article.

    Liu Xiaobo: Noble, Noble, Nobel: The Prize we all Waited for? Original Article.

    The (Jasmine) Revolution That Wasn't - And Why it Wasn't Original Article.

    US-Funded Interference Programs in China 2008: Financing Unrest in the Name of Freedom Original Article (pdf).

    US-Funded Interference Programs in China 2009: Financing Unrest in the Name of Freedom Original Article (pdf).

    The Mass Media: China and The West: News and Propaganda, Truth and Fiction Original Article.

    The US Media - Let's Revolt: Don't Believe Everything You Read Original Article.

    An Open Letter to President Obama From the Chinese People Original Article.

    Let's Meet Some Real People: A Look at China and its People the Way They Really are Today Original Article.