Big Data and Official Statistics

Big Data and Official Statistics

Steve MacFeely
Copyright: © 2019 |Pages: 30
DOI: 10.4018/978-1-5225-7077-6.ch002
(Individual Chapters)
No Current Special Offers


Over recent years, the potential of big data for government, for business, for society has excited much comment, debate, and even evangelism. But are big data really the panacea to all our data problems or is this just hype and hubris? This is the question facing official statisticians: Are big data worth the investment of time and resources? While the statistical possibilities appear endless, big data also present enormous challenges and potential pitfalls: legal, ethical, technical, and reputational. This chapter examines the opportunities and challenges presented by big data and also discusses some governance issues arising for official statistics.
Chapter Preview


Over recent years the potential of big data for government, for business, for society has excited much comment, debate and even evangelism. Described by Pat Gelsinger (2012) of EMC as the 'new science' with all the answers, a paradigm destroying phenomena of enormous potential (Stephens-Davidowitz, 2017) and a panacea to all our data problems. Official statisticians, already with a long history of using non-survey or secondary data, which are often very large in terms of volume, must decide whether big data is really something new, or just more of the same, only more so. On-the one hand, some argue that big data needs to be seen as an entirely new ecosystem and requires serious strategic rethinking on the part of the official statistical community (Letouzé & Jütting, 2015) whereas others argue to the contrary that big data is just hype and that big data are just data (Thamm, 2017). Perhaps psychologist Dan Ariely (2013) was correct when he tweeted hilariously 'Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.'

Big data is the by-product of a technological revolution. In simplistic terms, one can think of big data as the collective noun for all of the new digital data arising from our digital activities. Our increasing day-to-day dependence on technology is leaving digital footprints everywhere. Those digital footprints or digital exhaust offers official statisticians rich and tantalizing opportunities to augment or supplant existing data sources or generate completely new statistics. With the computing power now available these digital data can be shared, cross-referenced, and repurposed as never before opening up a myriad of new statistical possibilities. Big data also present enormous statistical and governance challenges and potential pitfalls: legal; ethical; technical; and reputational. Big data also present a significant expectations management challenge, as it seems many hold the misplaced belief that accessing big data is straight-forward and that their use will automatically and dramatically reduce the costs of producing statistical information. As yet the jury is out on whether big data will offer official statistics anything especially useful. Beyond the hype of big data, and hype it may well be1, statisticians understand that big data are not always better data and that more data doesn't automatically mean more insight. In fact more data may simply mean more noise. As Boyd and Crawford (2012: 668) eloquently counsel 'Increasing the size of the haystack does not make the needle easier to find.'

This chapter will examine big data from the perspective of official statistics and outline some of the opportunities, challenges and governance issues that they present. To understand the governance issues involved with big data from the unique perspective of official statistics, it is useful to first define what we mean by big data, administrative data and official statistics. Thereafter the chapter will look at sources of big data, before examining the opportunities and challenges presented by big data. Before concluding, the chapter will briefly outline some of the governance structures that National Statistical Offices (NSOs) and International Organisations (IOs) might need to consider putting in place if they intend to harvest big data for the purposes of compiling official statistics.

Key Terms in this Chapter

OGD: Open government data.

AVL: Automatic vehicle location.

SDG: Sustainable development goals.

UNDS: United Nations Statistical Division.

CPD: Continuous professional development.

GDP: Gross domestic product.

ATM: Automatic teller machine.

IRs: Internal Revenue Service.

GDPR: General data protection regulation.

NSO: National Statistical Office.

NSA: National Security Agency.

IO: International organization.

NSS: National statistical system.

Complete Chapter List

Search this Book: