Big Data: The Data Deluge

Big Data: The Data Deluge

Jayshree Ghorpade-Aher (University of Pune, Pune, India), Reena Pagare (University of Pune, India), Anita Thengade (University of Pune, Pune, India), Santaji Ghorpade (IBM India Pvt. Ltd., India) and Manik Kadam (Allana Institute, India)
DOI: 10.4018/978-1-4666-8737-0.ch001
OnDemand PDF Download:
No Current Special Offers


Today is the Computer Era, where the data is increasing exponentially. Managing such a huge data is a challenging job. Under the explosive increase of global data, the term of big data is mainly used to describe enormous datasets. The state-of-the-art of big data is discussed here. The discussions aim to provide a comprehensive overview and big-picture to readers of this existing research area. This chapter discusses the different models and technologies for Big Data; It also introduces Big data Storage. Big data has been a potential topic in various research fields and areas like healthcare, public sector, retail, manufacturing personal data, etc.
Chapter Preview

I. Introduction

Since last two decades, there is a huge amount of data collection and we need something to handle that, so as the development process we are moving from file to database, data-ware house, datacenters, and now the big term Big Data. Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. One must choose an appropriate way to gain value from this huge data. According to a report from International Data Corporation (IDC), in 2011; the overall created and copied data volume in the world was 1.8ZB (≈1021B), which increased by nearly nine times within 5 years (Min, Shiwen, Yunhao, 2014). The various source for collection of such data are generated from online transactions, emails, videos, audios, images, click streams, logs, posts, search queries, health records, social networking interactions, science data, sensors and mobile phones and their applications. According to the recent survey, a personal computer holds about 500 terabytes. Various other sources like Facebook (log data of over 10 PB per month), Google (processes 100 PB of data) electricity board, telecom industries, organizations, twitter (12 TB), railway, airlines (black box), stock exchange market etc. also add to the huge amount of data (≈5 EB of data was created by humans until 2003). Today this amount of information is created in just a day. Figure 1 specifies the various components of Big data. Today working with conventional databases and other software tools have not proven efficient while handling the large datasets.

Figure 1.

The 3 V’s of Big Data

  • Definition:Big data is a term for massive data sets having large, more varied and complex structure with the difficulties of storing, analyzing and visualizing for further processes or results.

In 2010, Apache Hadoop defined big data as datasets which could not be captured, managed, and processed by general computers within an acceptable scope.

May 2011, McKinsey & Company, a global consulting agency announced Big Data as the next frontier for innovation, competition, and productivity.

Doug Laney, an analyst of META (presently Gartner) defined challenges and opportunities brought about by increased data with a 3Vs model as shown in figure 1 [Seref & Duygu, 2013], that is the increase of Volume (size of data increasing day by day), Velocity (processing of data), and Variety (Structured data inserts a data warehouse already tagged and easily sorted but, Unstructured data is random and difficult to analyze, Semi-structured data does not conform to fixed fields but contains tags to separate data elements).

  • High-Volume: It is the amount and size of the data. Enterprises are awash with ever-growing data of all types, moving from TBs to even PBs of information.

For example, turn 12 terabytes of Tweets created each day into improved product sentiment analysis and convert 350 billion annual meter readings to better predict power consumption.

Complete Chapter List

Search this Book: