Using Excel and Excel VBA for Preliminary Analysis in Big Data Research

Using Excel and Excel VBA for Preliminary Analysis in Big Data Research

Paul John Blayney (University of Sydney, Australia) and Zhaohao Sun (Papua New Guinea University of Technology, Papua New Guinea)
Copyright: © 2019 |Pages: 27
DOI: 10.4018/978-1-5225-7277-0.ch007

Abstract

Can big data research be effectively conducted using spreadsheet software (i.e., Microsoft Excel)? While a definitive response might be closer to “no” rather than “yes,” this question cannot be unequivocally answered. As spreadsheet scholars, the authors' inclination is to answer in the positive. To this regard, the chapter looks at how Excel can be used in conjunction with other software and analytical techniques in big data research. This chapter also argues where and how to use spreadsheet software to conduct big data research. A focal argument of this chapter is that the key behind big data driven research is data cleansing and big data driven small data analysis. The proposed approach in this chapter might facilitate the research and development of intelligent big data analytics, big data analytics, and business intelligence.
Chapter Preview
Top

1. Introduction

Trained programmers are competent with programming methods and design. They are not generally proficient in Big Data analysis (Raffensperger, 2001, p. 62). On the other hand, Big Data researchers do not necessarily possess programming skills. Spreadsheet software (e.g. Excel) provides a means for the non-programmer to conduct analysis that could previously only performed by a trained programmer or analyst.

Most Big Data analysts will be skeptical with the use of Excel for Big Data analysis. They will highlight that the volume, velocity and variety of Big Data far exceeds the capacity of spreadsheet software (Sun, Sun, & Strang, 2018). They’re right from several perspectives. However, this chapter attempts to address the following research questions?

  • 1.

    Have spreadsheets a place in Big Data research?

  • 2.

    How can programming with Excel VBA contribute?

  • 3.

    How can the Excel Power Pivot add-in contribute?

This paper does not suggest that spreadsheet software can replace the advanced analytical techniques used for Big Data analytics. However, it does propose that spreadsheets have a place in Big Data in the same way that the apps on your phone have a place in your everyday life. Modern day apps are useful because they’re easy to use and readily available. They provide you with useful information on a real-time basis (i.e. when you don’t have time to investigate properly or talk to an expert).

The research demonstrates that Excel (especially when supplemented with its Power Pivot add-in) can fill the same role in Big Data analytics. Spreadsheet analysis can provide valuable insights as to what advanced analytics are appropriate.

For example, preliminary testing can be conducted on a subset of the 50 terabytes (TB) of web server logs that an Internet Service Provider CEO wants looked at for the latest trends in customer demands for the company’s products (Department of Communication and the Arts, 2018).

Skilled use of Excel will allow better use of the Big Data analyst scarce resource; that valuable analyst time is not wasted exploring “futile” data (i.e. data without significant relationships).

A word of caution is warranted prior to using spreadsheet software for preliminary Big Data (e.g. small data) analysis. While the benefits (in business and other) provided by spreadsheet use are substantial and immeasurable; the costs of spreadsheet errors are also huge and well publicised. For example, see Butler (2018).

To this regard, frightening or entertaining reading (depending on your point of view) is provided by the European Spreadsheet Risks Interest Group (EuSpRIG, n.d.-b). This non-profit organisation of academics and business professionals proclaim their website (www.eusprig.org/) as “the World’s premier site for information, action, conferences and dialogue on Spreadsheet Risk Management”. One of the links provided on the EuSpRIG homepage is “Horror Stories” (EuSpRIG, n.d.-a).

Many of the spreadsheet errors cited in these stories can be argued to be human error. However, it can also be ascertained that spreadsheet software has been largely responsible for enabling the human error to take place. As elucidated by Ray Panko (1998) most spreadsheets contain errors - “the issue is how many errors there are, not whether an error exists”. Therefore, the task for the Big Data analyst is to apply standard organisational programming development principles to their use of spreadsheets.

Complete Chapter List

Search this Book:
Reset