Emergence of NoSQL Platforms for Big Data Needs

Emergence of NoSQL Platforms for Big Data Needs

Jyotsna Talreja Wassan (Maitreyi College, University of Delhi, India)
Copyright: © 2014 |Pages: 12
DOI: 10.4018/978-1-4666-5202-6.ch074
OnDemand PDF Download:
No Current Special Offers

Chapter Preview



Big data is revolutionizing world in the age of Internet. The wide variety of areas like online businesses, electronic health management, social networking, demographics, geographic information systems, online education etc. are gaining insight from big data principles. Big data is comprised of heterogeneous datasets which are too large to be handled by traditional relational database systems. An important reason for explosion of interest in big data is that it has become cheap to store volumes of data and there is a major rise in computation capacity. To extract valuable patterns from big data, one needs to choose a right platform for capturing, organizing, searching and analyzing the context of voluminous data in combination with traditional enterprise database management systems.

Different platforms supporting big data management by many software organizations enable easy use of services. These platforms mainly focus on data storage, management, processing, and distribution and on data analytics. Various NoSQL data stores like Cassandra, MongoDB and Hadoop HBASE etc. are in use today to acquire, manage, store and query big data. NoSQL databases are inherently schema-less and permit records to have variable number of fields, making them distinct from other non-relational databases like hierarchical databases and object-oriented databases. These are highly scalable and well suited for dynamic data structures. NoSQL data is characterized by being basically available and eventually consistent. The frameworks like MapReduce, Dryad etc. support processing of large amounts of data in parallel and hence the management and analysis of big data. The technologies like GNU R and Apache MAHOUT are also useful in exploring big data for finding relevant valuable patterns. This article aims at giving an overview of the rationales behind NoSQL movement as well as various big data platforms useful in today’s competitive world.


Main Focus

Major real world applications like business analytics operational on big data, cannot store or process all of the data on just one machine. The data must be stored, distributed or processed in parallel manner for computations to be completed efficiently. Various platforms are making big data management and processing more effective, forming the basis of current research theme in the era of Big Data. The main focus of this article is to discuss NoSQL big data storage platforms which could support processing of futuristic massive volumes of data in parallel.

Key Terms in this Chapter

Directed Acyclic Graph (DAG): A directed acyclic graph (DAG) is a directed graph (i.e a set of directed edges and vertices) with no directed path initiating and terminating on the same vertex.

Semi Structured Data: Semi-structured data does not conform to the formal structure of data models associated with traditional relational databases or other forms of data tables, but can provide functionality of markers to separate semantic elements and enforce hierarchies of records and fields within the data.

Distributed System: It consists of autonomous machine nodes connected in a network to communicate, share and coordinate their activities through message passing to achieve a common goal.

Database Management Systems (DBMS): A Database Management Systems (DBMS) is a set of programs that enables storing, adding, deleting, accessing, modifying, updating or analyzing data stored in one location.

Data Intensive Computing: It classify various parallel computing applications which use a data parallel approach to process large volumes of data like terabytes or petabytes in size and referred to as Big Data.

Online Transaction Processing (OLTP): It refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.

MapReduce: MapReduce is a parallel programming model proposed by Google and is used to distribute computing on clusters of computers for processing large data sets.

Complete Chapter List

Search this Book: