Storage and Bandwidth Optimized Reliable Distributed Data Allocation Algorithm

Storage and Bandwidth Optimized Reliable Distributed Data Allocation Algorithm

Hindol Bhattacharya (Jadavpur University, Kolkata, India), Samiran Chattopadhyay (Department of Information Technology, Jadavpur University, Kolkata, India), Matangini Chattopadhyay (School of Education Technology, Jadavpur University, Kolkata, India) and Avishek Banerjee (Asansol Engineering College, Asansol, India)
Copyright: © 2019 |Pages: 18
DOI: 10.4018/IJACI.2019010105

Abstract

Distributed storage allocation problems are an important optimization problem in reliable distributed storage, which aims to minimize storage cost while maximizing error recovery probability by optimal storage of data in distributed storage nodes. A key characteristic of distributed storage is that data is stored in remote servers across a network. Thus, network resources especially communication links are an expensive and non-trivial resource which should be optimized as well. In this article, the authors present a simulation-based study of the network characteristics of a distributed storage network in the light of several allocation patterns. By varying the allocation patterns, the authors have demonstrated the interdependence between network bandwidth, defined in terms of link capacity and allocation pattern using network throughput as a metric. Motivated by observing the importance of network resource as an important cost metric, the authors have formalized an optimization problem that jointly minimizes both the storage cost and the cost of network resources. A hybrid meta heuristic algorithm is employed that solves this optimization problem by allocating data in a distributed storage system. Experimental results validate the efficacy of the algorithm.
Article Preview
Top

1. Introduction

Reliability is the basic demand for any form of storage. Reliability in the context of data storage implies preserving the original form of data as it has been stored in the system by the subscriber. Loss of integrity of stored data leads to data corruption and hence low reliability. Data corruption can be attributed to both malicious as well as inadvertent corruption due to unavoidable reasons. Separate mechanisms are in place in the domain of information security to deal with malicious corruption of data. However, even in the absence of malicious attacks, data are likely to get corrupt over a period of time. So, error correction codes are used to recover from such eventual data corruption and to maintain high reliability. However, error correction codes occupy considerable storage and therefore they contribute to storage overhead. A naive error correction mechanism requires excessive amount of storage just to store such codes. Thus, efficient error correction codes are required to achieve high reliability with low cost of storage. In distributed storage systems, apart from clever error correction codes; reliable low-cost storage can be achieved by exploiting numerous other parameters, unique to such systems. Distributed storage requires data to be stored in a set of storage nodes distributed across a network. In such systems, storage options such as changing allocation pattern in nodes, option to locally or globally repair a problem, etc. become available. These options can be intelligently exploited to improve reliability without having to incur excessive storage cost. Distributed Storage Allocation is one such option available in distributed storage systems that may ensure high reliability in storage systems while minimizing the storage cost.

The cost factor in distributed storage allocation is defined in terms of total storage budget. Such cost factor is based on the core assumption that the communication links between the nodes are able to support the amount of data on the respective storage nodes. In cases where the communication link capabilities are not sufficient, total storage budget is limited and defined in terms of the communication link capacity. The limitation of existing works is that they choose one between two equally important cost factors viz. storage and link capacities. Ideally, a storage network administrator likes to minimize both the storage and communication costs together, if possible. An optimization problem which considers both of them as costs and works towards their minimization will be a novel one.

To the best of our knowledge no work has been done to study the network performance while varying the storage allocation pattern and spread. Our intuition is that network performance will vary noticeably with change in network allocation pattern and spread. A simulation-based study to explore the network performance by varying storage pattern and spread has been carried out and presented in this paper which validates this speculation. Motivated by the results of the simulation study, we have proposed a data placement algorithm which allocates data on distributed storage nodes that optimizes both storage and link capacity costs.

Based on the discussions, the objectives of the paper may be stated as follows:

  • Study network performance through simulation by varying the network allocation patterns. We verify whether allocating equal amounts of data to each node of the distributed system (symmetric allocation) gives better performance than allocating unequal amount of data to each nodes of the distributed system (non-symmetric allocation). We also investigate whether allocating data to a few nodes (minimal spread) is better in terms of network performance than allocating data to a maximum number of nodes (maximal spread);

  • Formulate an optimization problem which can jointly optimize the essential cost factors viz. required storage and link capacity; while maintaining high reliability;

  • Propose an intelligent data placement algorithm which allocates data to storage nodes such that minimum storage space and link capacity will be required; while maintaining high reliability.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing