Big Data Visualization Tools and Techniques

Big Data Visualization Tools and Techniques

Obinna Chimaobi Okechukwu (Arkansas State University, USA)
DOI: 10.4018/978-1-5225-3142-5.ch017

Abstract

In this chapter, a discussion is presented on the latest tools and techniques available for Big Data Visualization. These tools, techniques and methods need to be understood appropriately to analyze Big Data. Big Data is a whole new paradigm where huge sets of data are generated and analyzed based on volume, velocity and variety. Conventional data analysis methods are incapable of processing data of this dimension; hence, it is fundamentally important to be familiar with new tools and techniques capable of processing these datasets. This chapter will illustrate tools available for analysts to process and present Big Data sets in ways that can be used to make appropriate decisions. Some of these tools (e.g., Tableau, RapidMiner, R Studio, etc.) have phenomenal capabilities to visualize processed data in ways traditional tools cannot. The chapter will also aim to explain the differences between these tools and their utilities based on scenarios.
Chapter Preview
Top

Introduction

Business decisions have always been reliant on available information. Without the right type of information at the right time, business decisions can be flawed and in some cases catastrophic. Managers and top line executives alike rely on data, facts and historical records to be able to take actions that would solve a problem, avoid a potential business problem or even create new business opportunities. In a recent research study conducted among 600 medium sized British firms, insufficient information and information barriers are accounted as one of the biggest constraints to management efficiency (Bloom, Lemos, Qi, Sadun, & Reenen, 2011).

It is argued that the visual representation of data (data visualization) is perhaps one of the most important aspects of data analysis. Decision makers can relate better with a visual reference to information that is given to them as opposed to textual information. Through visual perceptions and cognitive processes, data can be made easier to understand and better business insight can be obtained from the data. Let us consider an example.

Figure 1.

Visual navigation map showing vehicular route from Hauppauge to Long Island (Google, 2015)

Figure 2.

Textual description of the vehicular route from Hauppauge to Long Island. (Google, 2015)

In the example above, an illustration of how graphical visualization can provide better information than textual information is shown. Suppose an individual wants to determine the relative geographic position of Hauppauge from Long Island. Figure 1 will better provide that individual with information on the relative positions of both locations than Figure 2 would. This illustrates the effectiveness of visual data presentation over textual data.

Top

Visualization Techniques

In every business organization – and even in people’s personal lives – there is a constant flow of data visualization. These come in several forms such as bar charts, pie charts, line graphs, scatter plots, etc. However, not every graph or chart can be used to display the result of every type of data. There are several parameters or factors that determines what sort of visual reporting tool is most appropriate for reporting the results of a given set of data. Some of these parameters are:

  • The characteristics of the data set: numeric, alphanumeric, graphical, etc.

  • The volume of the data: few records of data or large records of data.

  • The dimension of the data: few data attributes or large number of data attributes.

  • The relationship between the attributes of the data.

  • The number of variables in the data set: univariate, bivariate or multivariate.

  • The data source, etc.

Other factors that can affect what reporting tool should be used is the data type. A set of data can be discrete or continuous in nature (Soukup & Davidson, 2002). These data types are referred to as discrete variables and continuous variables respectively. Discrete variables can be:

Key Terms in this Chapter

Term Frequency/Inverse Document Frequency: Is a numerical statistic that is aimed at indicating how important a word is to a document or a corpus of text. It is mostly used as a critical indication factor in information retrieval.

Series Charts: Also known commonly as time series graphs are a type of chart that utilizes a set of data points known as markers to plot a connecting line typically across an x-axis. The main purpose of line graphs is to show a continuous trend of data over a period of time.

Naïve Bayes Classification Model: A probabilistic model that is based on applying the Bayes’ theorem with significant independent assumptions between features.

Visualization: Visualization is any technique for creating images, diagrams or animations to communicate a message. Visualization through visual imagery has been an effective way to communicate both abstract and concrete ideas since the dawn of mankind ( Wikipedia, 2016 AU107: The citation "Wikipedia, 2016" matches multiple references. Please add letters (e.g. "Smith 2000a"), or additional authors to the citation, to uniquely match references and citations. ).

Treemaps: A chart style that depicts the hierarchical data relationships of data subsets as a set of nested rectangular graphical blocks.

Tokenization: This is the process of breaking strings of text into smaller pieces. These pieces of text are referred to as tokens. One other key task in the process of tokenization is discarding characters that may not provide valuable information such as punctuations.

Corpus: A collection of written texts, literary works or aggregated data on a particular subject matter or the entire textual aggregation of works by a specific author.

Pearson’s Correlation: Is a measure of the linear correlation between two variables AandB which is widely used as a measure of the degree of linear dependence between two variables.

Complete Chapter List

Search this Book:
Reset