Reliability and Validity in Automated Content Analysis

Reliability and Validity in Automated Content Analysis

Stuart Soroka (McGill University, Canada)
Copyright: © 2014 |Pages: 12
DOI: 10.4018/978-1-4666-4999-6.ch020
OnDemand PDF Download:
No Current Special Offers


In light of the research in other chapters in this volume, this chapter considers some of the important and as-yet-unresolved methodological issues in automated content analysis. The chapter focuses on DICTION in particular, but the concerns raised here also apply to automated content analytic techniques more generally. Those concerns are twofold. First, the chapter considers the importance of aggregation for the reliability of content analyses, both human- and computer-coded. Second, the chapter reviews some of the difficulties associated with testing the validity of the kinds of complex (latent) variables on which DICTION is focused. On the whole, the chapter argues that this (and its companion) volume reflect just some of the many possibilities for DICTION-based analyses, but researchers must proceed with a certain amount of caution as well.
Chapter Preview


If the work in this and its companion volume makes anything clear, it is that there is widespread potential for the automated content analysis — not just in the study of political communication (the field in which DICTION originated), but across the social sciences. In many fields, it turns out, words matter. And large-scale quantitative analysis of these words can help us understand the nature of policy debates, the nature and structure of media content, and the direction of future shifts in the economy — to name just a few examples.

In many ways, we are in the midst of a renaissance in the quantitative study of words in the social sciences. That renaissance, spurred on by the increasing availability of digitized text, has produced a host of new content-analytic approaches. There are a range of new statistical techniques which classify text based not on dictionaries but on a range of statistical classifiers. Supervised machine learning identifies features in reference texts (typically human-coded), and then searches the same features out in other texts (see, e.g., Laver, Benoit, & Garry, 2003; Purpura & Hillard, 2006). Unsupervised machine learning searches out word associations, much like factor analysis (see, e.g., Hogenraad, McKenzie, & Péladeau, 2003; Landauer & Dumais, 1997). There is also renewed interest in non-statistical, dictionary-based approaches to content analysis. There are efforts to development new dictionaries (e.g., Young and Soroka 2012), and efforts to refine and improve on the use of a range of established ones. As the current volume makes clear, one of the most prominent of these is DICTION.1

That said, there are also ways in which this research stream is still in its early stages. There are many things that automated approaches surely miss; there are some things they will simply never capture; and some of the things they do reliably capture can be used in strange (and sometimes inappropriate) ways. Why is this the case? The answer is easy: language is complicated. But there clearly is also a lot of potential where automated content analysis is concerned.

Realizing that potential requires that we think seriously about a number of related issues. Indeed, at the present time I believe we suffer mainly from the following: our ability to measure content has far exceeded our consideration of a range of issues relating to the reliability and validity of that measurement. This is not a unique concern — reliability and validity are core concepts addressed in almost every text on content analysis (e.g., Krippendorf 1980; Neuendorf 2002). There has been a resurgence in interest in these issues in political science more generally as well (e.g., Adcock and Collier 2001). But the present volume, including an agglomeration of such a broad set of studies based on analyses using DICTION, seems to be a particularly good opportunity to note some of the central and-as-yet inadequately addressed methodological issues in automated content analysis.

In short, chapters in this volume use very similar methods and measures in very different ways. We should look at these critically; and we should take some time to consider which ways we like, and which leave us a little skeptical. I do not highlight my own personal tastes here. Rather, I highlight just two of what I regard as the fundamental issues that those interested in automated content analysis must consider more fully. And I do so by drawing in particular on the pioneering work of Roderick Hart, and the increasingly diverse and fascinating literature that DICTION has produced.

Complete Chapter List

Search this Book: