Importance of Big Data

Seema Ansari, Radha Mohanlal, Javier Poncela, Adeel Ansari, Komal Mohanlal

Source Title: Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence

DOI: 10.4018/978-1-4666-8505-5.ch001

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Combining vast amounts of heterogeneous data and increasing the processing power of existing database management tools is no doubt the emerging need of IT industry in coming years. The complexity and size of data sets that need to be acquired, analyzed, stored, sorted or transferred has spiked in the recent years. Due to the tremendously increasing volume of multiple data types, creating Big Data applications that can extract the valuable trends and relationships required for further processes or deriving useful results is quite challenging task. Companies, corporate organizations or be it government agencies, all need to analyze and execute Big Data implementation to pave new paths of productivity and innovation. This chapter discusses the emerging technology of modern era: Big Data with detailed description of the three V's (Variety, Velocity and Volume). Further chapters will enable to understand the concepts of data mining and big data analysis, Potentials of Big Data in five domains i.e. Healthcare, Public sector, Retail, Manufacturing and Personal location Data.

Chapter Preview

Top

I. Introduction

Due to the advancement in sciences, engineering, and technology in recent years, the human endeavors and the social and economic activities have been generating tremendous amount of data which is referred to as Big Data. The ‘Big’ word in Big Data is referring to tremendous amount of data that is being generated in this modern era. Figure 1, Sources of Big Data include: online transactions, scientific experiments, research, emails, videos, audios, images, logs, events, genomic investigations, web posts, search engine queries, health records data, surveillance, geo spatial data, social networking interactions, texts and mobile phone applications, RFID scans and sensors (Eaton, Deroos, Deutsch, Lapis, & Zikopoulos, 2012), (Schneider, 2012).

Figure 1.

A summary of various dimensions of Big Data

(Du, Z. 2013)

The traditional databases will not alone solve all aspects of Big data problem (Madden, 2012). We need more robust algorithms for machine learning that will be easier to apply. A data management ecosystem needs to be developed so that users can enforce consistency properties over it along with visualizing and understanding their algorithm result.

We can say that many domains and economic sectors can benefit from the big data push: life and physical sciences, medicine, education, healthcare, location-based services, manufacturing, retail, communication and media, government, transportation, banking, insurance, financial services, utilities, environment, and energy industry (Manyika, Chui, Brown, Bughin, Dobbs, Roxburgh, & Byers, 2011).

Top

Ii. Background

The phenomenon of Big Data gets more intensified and diversified with the passing years. Hence, to create Big Data applications to extract information from torrents of data becomes a challenging task for I.T. specialists and Data Analysts. Big Data Analysis has become the fertile ground for advancement of knowledge, innovation and enhanced decision-making process. In view of Dannah Boyd et.al (Du, 2013) modern society is the era of Big Data.

IBM indicates that everyday 2.5 Exabytes of data is created. Almost 90%of the existing world data has been produced in the last two years. It is interesting to know that 5 Exabytes of Data were created until 2003 but now this amount is generated merely in two days. In 2012, digital world of data was expanded to 2.72 zettabytes. It is predicted to double every two years, reaching about 8 zettabytes of data by 2015 (Intel IT Center, 2012)

Mobile subscription in the world has reached a massive figure i.e. 6 billion. Everyday 10 billion text messages are sent. With the growing technology it is predicted that by 2020 about 50 billion devices will be connected to internet. When we talk about social networking media, we encounter statistics which enable us to understand Big Data more deeply. For instance, face book has 955 million monthly active accounts using 70 different languages.140 billion friend connections, everyday 30 billion pieces of content and 2.7 billion likes and comments are posted. Twitter encounters 1 billion tweets from more than 140 million active users within 72 hours. Only Google has got more than one million servers around the world. It is predicted that within the next decade the amount of information will increase by 50 times. Though the number of data analysts and I.T. specialists will not increase with the same pace i.e. only by 1.5 times (Tankard, 2012). Many people think that Big Data is about Hadoop but in reality Hadoop is just subset of Big Data, Figure 2.

Figure 2.

Big Data is not just Hadoop

(Martin Pavlik, 2013).

Key Terms in this Chapter

Sensors: Devices that act as a transducer to transform energy in physical environment into electrical signals. They can “sense” a physical change in the surrounding environment or other characteristics that may change due to some excitation, such as heat or force, convert that into an electrical signal and convey the information to the controlling stations.

Database Management System (DBMS): It is a collection of software programs that enables a computer to execute database tasks such as saving & storing, retrieving, adding, removing and altering data, safety and security and reliability of data in a database.

Big Data: Big Data may be defined as a collection of huge data that becomes challenging to process through available conventional data processing tools and applications. It involves high-volume, high-velocity and high-variety of information resources that require economical, productive and new methods of information processing for improved and enhanced decision making.

Big Data Analysis: Big Data Analysis involves seven phases: Data acquirement and record keeping, Information extraction, clean and annotate, integrate, aggregate and represent, analyze and model, and interpret the data.

Distributed Systems: A collection of independent machines connected by a network and equipped with software that provides the ability to machines to manage and coordinate their activities and share resources such as hardware, software and data. To the user it appears as an integrated single computer.

Hadoop: Hadoop is an open source software that has the ability to facilitate distributed data processing of huge data sets, in clusters of servers with high degree of fault tolerance. It can identify and take care of failures at the application level.

Cloud Computing: Cloud computing involves a network of remote servers on the internet that facilitates storage, managing and processing data. It is a model for on demand access to a shared pool of computing resources such as networks, servers, storage, etc.

Data Warehouse: An accumulation of huge amounts of data from various sources inside a company and facilitates management in decision making. It is a relational database, intended for handling queries and analysis and not designed for transaction processing.

Data Mining: The sorting of huge quantities of data for useful information. It is used to discover patterns and correlations in large relational databases.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference