Introduction to Big Data and Business Analytics

Introduction to Big Data and Business Analytics

Dineshkumar Bhagwandas Vaghela (Shantilal Shah Government Engineering College, India)
DOI: 10.4018/978-1-5225-4999-4.ch015

Abstract

The term big data has come due to rapid generation of data in various organizations. In big data, the big is the buzzword. Here the data are so large and complex that the traditional database applications are not able to process (i.e., they are inadequate to deal with such volume of data). Usually the big data are described by 5Vs (volume, velocity, variety, variability, veracity). The big data can be structured, semi-structured, or unstructured. Big data analytics is the process to uncover hidden patterns, unknown correlations, predict the future values from large and complex data sets. In this chapter, the following topics will be covered more in detail. History of big data and business analytics, big data analytics technologies and tools, and big data analytics uses and challenges.
Chapter Preview
Top

Introduction

The modern technologies generate very complex and unstructured data in very huge amount such as RFID data, web logs, sensors devices, Internet searches, machinery, social networks like Facebook, Twitter and many more, vehicle sensors, portable computers, cell phones, call center records and GPS devices. All these technologies are used in different types of the applications. In sentiment analysis, the sentiments from different sources on specific topic can be collected which are in terms of large volume. The sentiments about the product, movies or any person can be viewed from different official sites or from social media sites such as Twitter, Facebook, Instagram and many more. Politicians and governments often use sentiment analysis to understand how the people feel about themselves and their policies. The Figure 1 shown below represents the sources of Big Data.

Figure 1.

Sources of Big Data

978-1-5225-4999-4.ch015.f01

The rapid generation of the large volume of data has 5V’s characteristics. Here 5V refers Volume, Velocity, Variety, Variability and Veracity. Here 5 V’s of big data has been clearly explained in Figure 2.

Figure 2.

5 V’s of Big Data

978-1-5225-4999-4.ch015.f02

5V’s (Volume, Velocity, Variety, Variability, Veracity)

Volume

Volume refers an amount of data generated every unit time. Here the data can be of any form like emails, sensor data, video clips, photos, twitter messages etc. which the people generates and share within the unit time period. These data are in the form of hundreds of Zetta Bytes or Bronto Bytes. It has been noticed that on Facebook approximately 10 billion messages have been sent per day with clicking “like” button 4.5 billion times and upload more than 350 million photos every day (He et al., 2013). The amount of data generated per unit time is exponentially increasing; i.e all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute (Chen et al., 2012). This scenario bothers the researchers to store and analyze such a huge volume of data with tradition database technologies. But this problem has been overcome with the help of distributed systems where parts of data are stored at different geographical locations and brought together or processed and combined the results by software.

Velocity

Velocity refers to the speed at which new data is generated and the speed at which data moves around. As we know the social media messages become viral in fraction of seconds, speed of checking the credit card transactions for fraudulent activities, or the trading systems takes milliseconds to take decision to buy or sell shares by analyzing social media networks. This only happens with the big data technologies which allow the users/decision makers to analyze the data without putting them into database like traditional database processing approach.

Variety

Variety refers to the different types of data such as structured, semi structured or unstructured. Traditional data processing technologies use structured data that fits into tables or relational databases. But, at present, 80% of the world’s data is semi structured or unstructured, and hence they cannot be easily put into tables. These data are photos, video sequences or social media updates and many more. With big data technology we can now harness differed types of data (structured and unstructured) including messages, social media conversations, photos, sensor data, video or voice recordings and bring them together with more traditional, structured data.

Veracity

Veracity refers to the messiness or trustworthiness of the data. As we know that the data are available in many forms, due to this the quality and accuracy are not controllable. The large volume also makes up for the lack of quality or accuracy. This problem is resolved by the big data and analytics technologies.

Value

This is the most important aspect of Big Data. It is good to have access of big data but it will be useless if it is not turned into correct value. In business, the big data with quality values are helpful for decision making and generating more business after working the analytics on the big data.

Complete Chapter List

Search this Book:
Reset