Performance Analysis of Structured, Un-Structured, and Cloud Storage Systems

Performance Analysis of Structured, Un-Structured, and Cloud Storage Systems

Anindita Sarkar Mondal (School of Mobile Computing, Jadavpur University, Kolkata, India), Madhupa Sanyal (Department of Information Technology, Jadavpur University, Kolkata, India), Samiran Chattapadhyay (Department of Information Technology, Jadavpur University, Kolkata, India) and Kartick Chandra Mondal (Department of Information Technology, Jadavpur University, Kolkata, India)
Copyright: © 2019 |Pages: 29
DOI: 10.4018/IJACI.2019010101

Abstract

Big Data management is an interesting research challenge for all storage vendors. Since data can be structured or unstructured, hence variety of storage systems has been designed to meet storage requirement as per organization's demands. The article focuses on different kinds of storage systems, their architecture and implementations. The first portion of the article describes different examples of structured (PostgreSQL) and unstructured databases (MongoDB, OrientDB and Neo4j) along with data models and comparative performance analysis between them. The second portion of the paper focuses on cloud storage systems. As an example of cloud storage, Google Cloud Storage and mainly its implementation details have been discussed. The aim of the article is not to eulogize any particular storage system, but to clearly point out that every storage has a role to play in the industry. It depends on the enterprise to identify the requirements and deploy the storage systems.
Article Preview

Introduction

Data refers to a collection of fact and information, and this data is the source of knowledge, information for the entire world. There may be various sources of data like databases; flat files, online sources and the amount of data coming from sources are huge in quantity. If we look into one such data source says online websites, and then we can see that there are more than 2000 tweets per second, more than ten thousand Google searches per second, more than a million emails are being sent which are coming from another million websites. So, this huge dataset or information generated which is popularly called Big Data (McAfee, Brynjolfsson, Davenport, Patil, & Barton, 2012) is not just about being big, the main crux lies in the management of data. Here comes the role of Big Data Storage Systems which are actually the store house of data. The most common form of Big Data Storage is the traditional storage, such as, RAM (Random Access Memory), Disk Drives. This paper analyses two diversified categories of big data storage systems- databases and cloud storage systems. All of the storage systems have their unique feature that supports CRUD operation (Create, Read, Update, and Delete) on data.

Databases are the most common source of data. The logical structure of a database defines the organization of data. The earliest data model was hierarchical data model e.g., IBM Information System which was followed by hierarchical database (Tsichritzis & Lochovsky, 1976). Further evolution of databases resulted in foundation of relational model (Rumbaugh, Blaha, Premerlani, Eddy, & Lorensen, 1991) where data is represented as tuples or rows which aggregate to form a relation and such systems are called Relational Database Management System (RDBMS). RDBMS uses structured query language (SQL) as its data query language.

Now, the transition from RDBMS to NoSQL is very significant (Hadjigeorgiou, 2013). RDBMS has several advantages (Jatana, Puri, Ahuja, Kathuria, & Gosain, 2012) like data is stored in a structured way which helps in maintaining the entity relationship. When the data volume is huge and data context is not fixed with time, the demand for incorporation of a new system becomes essential. NoSQL (Not Only Structured Query Language) not only supports the storing of dataset but also supports durability, reliability, availability and scalability (Han, Haihong, Le, & Du, 2011). Rather than following the ACID property, NoSQL database follows CAP (Consistency, Availability, and Partition Tolerance). With respect to transition-related application, RDBMS is better than NoSQL database.

Considering the NoSQL databases, they have a better management of structured, semi-structured and unstructured data (Moniruzzaman & Hossain, 2013, p. 19; Leavitt, 2010). There are four types of NoSQL databases like, (a) Key-Value: In this NoSQL database data is stored by forming of a group. This group is identified by a unique identifier known as key. Amazon S3, Azure follows this type of data storage structure to store large voluminous dataset. (b) Document: Here a set of data groups which have variable attributes are stored by forming a document. This document is identified by key value and presented in XML, JSON or BSON for-mat. CouchDB, MongoDB are the examples of document NoSQL database. (c) Graph: In a network-based system, instance of an entity is connected with other instance of another entity and this connection has explicit meaning to the storage dataset. In this situation, graph database stores dataset by holding information about how and in what way an instance is connected with other. OrientDB, Neo4j are the most popular graph-based NoSQL database. (d) Column-family: In this storage, data column-wise rather than as a horizontal tuple. This concept makes the data operation (i.e., access, storing) job faster. Cassandra, HBase are the example of Column-family NoSQL database.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing