Exploring “Mass Surveillance” Through Computational Linguistic Analysis of Five Text Corpora: Academic, Mainstream Journalism, Microblogging Hashtag Conversation, Wikipedia Articles, and Leaked Government Data

Exploring “Mass Surveillance” Through Computational Linguistic Analysis of Five Text Corpora: Academic, Mainstream Journalism, Microblogging Hashtag Conversation, Wikipedia Articles, and Leaked Government Data

DOI: 10.4018/978-1-5225-2679-7.ch004
OnDemand PDF Download:
No Current Special Offers


A lot of digital ink has been spilled on the issue of “mass surveillance,” in the aftermath of the Edward Snowden mass data leak of secret government communications intelligence (COMINT) documents in 2013. To explore some of the extant ideas, five text sets were collected: academic articles, mainstream journalistic articles, Twitter microblogging messages from a #surveillance hashtag network, Wikipedia articles in the one-degree “Mass_surveillance” page network, and curated original leaked government documents. These respective text sets were analyzed with Linguistic Inquiry and Word Count (LIWC) (by Pennebaker Conglomerates, Inc.) and NVivo 11 Plus (by QSR International, Inc.). Also, the text sets were analyzed through close (human) reading (except for the government documents that were treated in a non-consumptive way). Using computational text analytics, this author found text patterns within and across the five text sets that shed light on the target topic. There were also discoveries on how textual conventions affect linguistic features and informational contents.
Chapter Preview


Real truth lies, if anywhere, not in facts, but in nuance. -- John le Carré, The Pigeon Tunnel: Stories from my Life

In the past few years, the issue du jour has been about the secret “mass surveillance” of citizenry in the U.S. (and other Western democracies and the world) through the collection of various types of electronic information: metadata about phone interactions, recorded phone conversations, locations of mobile phones around the world, social media data, email messaging, communicated imagery, and other electronic contents. “Mass surveillance” is broadly defined as the monitoring of a large percentage of a population, for any number of purposes

These revelations occurred when a contractor with the National Security Agency, Edward Snowden, copied an estimated 1.5 to 1.7 million secret documents and leaked some 58,000 of these to multiple journalists in May – June 2013. These documents included Level 3 documents considered “the Keys to the Kingdom” (Epstein, 2017, p. 75). U.S. federal prosecutors filed a sealed criminal complaint against Snowden on June 14, 2013, charging him with theft of government property and unauthorized communication of classified communications intelligence (COMINT) information to unauthorized persons under the 1917 Espionage Act, and this complaint was released to the public a week later. In June 2015 came word that both Russia and China had accessed the bulk of the encrypted NSA files—which Snowden had said he’d fully “destroyed”—and these revelations led to the pulling out of various Western intelligence agents from both countries for their safety (Kelley, June 13, 2015). The fugitive was given temporary asylum in Moscow, Russia, where he had landed, after a brief stay in Hong Kong, China; up to the present, through September 2016, he remains in Moscow, with an extension of his political asylum.

In the intermediate years, there has been a recognizable mass media cycle: the revelation of a new surveillance method, new outrage, and then efforts by the U.S. government to clarify and explain both the general methods, the limits of those general methods, and the needs for law enforcement. In democratic governance where the “consent of the governed” has to be obtained, even intelligence agencies—for whom it is anathema to share information with the general public—has complied with requests to declassify some of the authorizing documents that enabled the mass surveillance (a month after 9/11) (Subramanian, Dec. 21, 2013). The articles releases have come in increments over years partially in part because of the complexity of the information and the need to bring on experts to understand the details; also, there is an interest for news organizations to spread out the revelations over time to ensure that the story has “legs” or endurance. Hundreds of formerly covert programs have been revealed.

One academic, in 2015, published a five-part taxonomy of the technologies as revealed, including collection programs, processing programs, attack programs, isolation programs, and database ones; there are also programs for which too little information is available. In the summary, Hu identifies bulk telephone metadata collection, uniform resource locator (URL) data, the capture of metadata and communications traveling through fiber-optic cables, hierarchy analysis through social network analysis, exploration of emails and website searches, the ability to redirect web browsers without user knowledge, spyware, programs to target heads of state and their aides, databases with collected data for various periods of time, and others (Hu, 2015, pp. 1693 - 1702). This taxonomy was collected in part to serve as a resource for those who may be exploring the constitutionality of the surveillance.

Along with this information comes a narrative of U.S. government heavy-handedness. For example, Academy Award-winning Citizenfour filmmaker Laura Poitras who has sued the U.S. government to find out why she has been stopped at the U.S. border. Heavily redacted government files on her—released through FOIA (Freedom of Information Act) requests—were the subject of a show at the Whitney Museum of American Art in NY (Williams, Feb. 17, 2016). There is a sense that there are hidden redlines to what movie makers may explore (Silverman, July 8, 2013). There is also a sense that the country that various citizens envision may be beyond acceptance for those who manage national security.

Complete Chapter List

Search this Book: