Big Data Problem, Technologies and Solutions

Big Data Problem, Technologies and Solutions

Hoda Ahmed Abdelhafez (Suez Canal University, Egypt)
Copyright: © 2014 |Pages: 13
DOI: 10.4018/978-1-4666-5202-6.ch031

Chapter Preview



Relational database systems were used in 1970s and the structure query language (SQL) had been the way of dealing with data structure in a relational form as shown in Figure 1. The explosion of mainframes and personal desktops has created data warehouse that can easily manage data from multiple databases. The development of data warehouse in 1990s introduced thousands of applications across industries domains such as purchasing, shipping, enterprise resource planning (ERP) and supply chain management (SCM). In 2001 the XML (Extensible Markup Language) technology was born and then the demand for content management systems was lead to analyze unstructured and semi-structured data in the enterprises. Today, the advent of Internet creates and distributes multiple formats that explored all types of data. The ability to manage volume, velocity and variety of data and find analytical ways to provide better information at precisely time needs this is the evolution called big data (Zikopoulos, et al., 2012).

Figure 1.

From relational database to big data


Big data represents large data sets whose size ranging from a few dozen terabytes to many petabytes of data in a single data set. Consider Web logs (the Web log records millions of visits a day), retail data (thousands of stores, tens of thousands of products, and millions of customers as well as billions of individual transactions in a year) and cell phone database (It stores time and location every 15 seconds for each of a few million phones) as sources of big data. Recently a wide variety of technologies such as Hadoop and MapReduce has been developed and adapted to aggregate, manipulate, analyze, and visualize big data in order to help the organizations to derive value from big data (Manyika, Chui, Brown, Bughin, Dobbs, Roxburgh, & Byers, 2011).

In addition to NoSQL or “Not Only SQL” database which overcomes the scaling limitations of relational database and manages unstructured data as well as distributing the work across multiple locations. NoSQL databases are schema-free design, therefore they enable applications to quickly upgrade the structure of data without table rewrites and allow data integrity and validity at data management layer. NoSQL systems are replicating and partitioning data over many servers to support a large number of simple read/write operations per second (Cattell, 2011).

Key Terms in this Chapter

NoSQL Database: Is non-relational database, also called not only SQL. It is an approach used to manage large sets of distributed data.

Cloud Computing: Is a model for delivering on-demand services, infrastructure, and application software using the network.

MapReduce: Is a programming model for processing large data sets.

Data Velocity: Means the speed of the data generation and the speed of the data delivery.

Apache Hadoop: Is an open source software framework that supports data intensive distributed applications.

Data Variety: Means the diverse source of data because of explosion of sensors, smart devices and social collaboration technologies.

Big Data: Is data that exceeds that processing capacity of conventional database systems.

Complete Chapter List

Search this Book: