Overview of Big Data and Its Visualization

Overview of Big Data and Its Visualization

Richard S. Segall (Arkansas State University, USA) and Gao Niu (Bryant University, USA)
DOI: 10.4018/978-1-5225-3142-5.ch001
OnDemand PDF Download:
List Price: $37.50


Big Data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. This chapter discusses what Big Data is and its characteristics, and how this information revolution of Big Data is transforming our lives and the new technology and methodologies that have been developed to process data of these huge dimensionalities. This chapter discusses the components of the Big Data stack interface, categories of Big Data analytics software and platforms, descriptions of the top 20 Big Data analytics software. Big Data visualization techniques are discussed with real data from fatality analysis reporting system (FARS) managed by National Highway Traffic Safety Administration (NHTSA) of the United States Department of Transportation. Big Data web-based visualization software are discussed that are both JavaScript-based and user-interface-based. This chapter also discusses the challenges and opportunities of using Big Data and presents a flow diagram of the 30 chapters within this handbook.
Chapter Preview

What Is Big Data?

Big Data is defined as collections of datasets whose volume, velocity or variety is so large that it is difficult to store, manage, process, and analyze the data using traditional databases and data processing tools (Bahga & Madisetti, 2016). According to an estimate by IBM, 2.5 quintillion bytes of data is created every day, and that 90% of the data in the world today has been created in the last two years alone (IBM, 2017).

In 2012, United States (US) government committed $200 million in “Big Data” research and development investment (The White House, 2012). Big Data application is estimated worth $300 billion dollars for the US health care industry, and $250 billion euros for the Europe’s public section administration (Manyika, Chui, Brown, Bughin, Dobbs, & Roxburgh, 2011). So what is Big Data? The numerical definition of Big Data is evolving with the development of the technology. A dynamic definition is that data which exceeds the capacity of commonly used hardware and software tools to capture, store and analyze within a tolerable elapsed time is considered as Big Data (Franks, 2012). Clegg (2017) authored a book on how the information revolution of Big Data is transforming our lives.

According to Marr (2016), Big Data in practice includes such as for Walmart: How Big Data is used to drive supermarket performance, Netflix: How Netflix used Big Data to Give us the programs we want, Rolls-Royce: How Big Data is used to drive success in manufacturing, and Facebook: How Facebook uses Big Data to make customer service more personal. Table 1 below list other multifaceted applications of Big Data as authored as individual chapters of Marr (2016) of how forty-five successful companies used Big Data to deliver extraordinary results.

Table 1.
Successful applications of Big Data analytics by organizations and companies around the world
Organization/CompanyBig Data Application
AmazonHow predictive analysis is used to get a 360-view of customers
Caesar’sBig Data at the Casino
Dickey’s Barbecue PitHow Big Data is used to gain performance insights into one of America’s most successful restaurant chains
ExperianUsing Big Data to make lending decisions and to crack down on identify fraud.
FitbitBig Data in the fitness arena
John DeereHow Big Data can be applied on farms
LinkedInHow Big Data is used to fuel social media success
Ralph LaurenBig Data in the fashion industry
Tera SeismicUsing Big Data to predict earthquakes
Transport for LondonHow Big Data is used to improve and manage public transportation in London, UK.
TwitterHow Twitter is used and IBM deliver customer insights from Big Data
UberHow Big Data is at the center of Uber’s Transportation Business
US Olympic Women’s Cycling TeamHow Big Data Analytics is used to optimize athletes performance
Walt Disney Parks and ResortsHow Big Data is Transforming our Family Holidays
ZSL and London ZooBig Data in the zoo and to protect animals

[Derived from book by Marr (2016).]

Key Terms in this Chapter

Circle Packing Graph: Circle packing is the study of the arrangement of circles (of equal or varying sizes) on a given surface such that no overlapping occurs and so that all circles touch one another (Circle Packing Graph, 2017).

MapReduce: A programming algorithm that divides and maps the elements of datasets; then shuffles and distributes to cluster computing powers for big data processing.

Parallel Coordinate Plot: A parallel coordinate plot maps each row in the data table as a line, or profile. Each attribute of a row is represented by a point on the line. This makes parallel coordinate plots similar in appearance to line charts, but the way data is translated into a plot is substantially different (Tibco, 2017).

Hadoop: Hadoop is an open-source software framework for the storage and processing of large datasets on a cluster of machines.

Data Measurement: Unit measurement to indicate the volume of data in modern computer storage devices.

Heatmap: A heat map (or heatmap) is a graphical representation of data where the individual values contained in a matrix are represented as colors (Heatmap, 2017).

Schema on Write: Data analysis strategy that new data is transferred to a structured predefined format.

Big Data: Data that exceeds the capacity of commonly used hardware and software tools to capture, store, and analyze within a tolerable elapsed time is considered big data. The three main characteristics of big data are volume, variety, and velocity.

Schema on Read: Data analysis strategy that new data is transferred to a plan or schema without a predefined format.

Streammap: A streamgraph, or stream graph, is a type of stacked area graph which is displaced around a central axis, resulting in a flowing, organic shape. (Streammap, 2017).

Machine Learning: A process that gives machine the ability to learn without being explicitly programmed.

Complete Chapter List

Search this Book: