PPHA-Popularity Prediction Based High Data Availability for Multimedia Data Center

PPHA-Popularity Prediction Based High Data Availability for Multimedia Data Center

Kuo-Chi Fang (Marshall University, Huntington, USA), Husnu S. Narman (Marshall University, Huntington, USA), Ibrahim Hussein Mwinyi (Marshall University, Huntington, USA) and Wook-Sung Yoo (Marshall University, Huntington, USA)
DOI: 10.4018/IJITN.2019010102

Abstract

Due to the growth of internet-connected devices and extensive data analysis applications in recent years, cloud computing systems are largely utilized. Because of high utilization of cloud storage systems, the demand for data center management has been increased. There are several crucial requirements of data center management, such as increase data availability, enhance durability, and decrease latency. In previous works, a replication technique is mostly used to answer those needs according to consistency requirements. However, most of the works consider full data, popular data, and geo-distance-based replications by considering storage and replication cost. Moreover, the previous data popularity based-techniques rely on the historical and current data access frequencies for replication. In this article, the authors approach this problem from a distinct aspect while developing replication techniques for a multimedia data center management system which can dynamically adapt servers of a data center by considering popularity prediction in each data access location. Therefore, they first label data objects from one to ten to track access frequencies of data objects. Then, they use those data access frequencies from each location to predict the future access frequencies of data objects to determine the replication levels and locations to replicate the data objects, and store the related data objects to close storage servers. To show the efficiency of the proposed methods, the authors conduct an extensive simulation by using real data. The results show that the proposed method has an advantage over the previous works in terms of data availability and increases the data availability up to 50%. The proposed method and related analysis can assist multimedia service providers to enhance their service qualities.
Article Preview
Top

1. Introduction

Due to the growth of Internet-connected devices and extensive data access applications in recent years, cloud computing systems are largely utilized. Because of high utilization of cloud storage systems, the demand for data center management has been increased. One of the challenges of data center management is the low performance because of the data unavailability when the scale of data and data center increase. To enhance system performance, several methods have been proposed (Vdovin & Kostenko, 2014; Nagendram, Lakshmi, Rao, & Jyothihi, 2011; Arzuaga & Kaeli, 2010; Plakunov & Kostenko, 2014; Hieu, Di Francesco, & Jaaski, 2014). Authors in (Vdovin & Kostenko, 2014; Nagendram et al., 2011) proposed a resource scheduling method to increase the utilization rate of each server in the system by considering the multi-dimensional resource requirements (e.g., CPU, Memory and Storage) of applications, and then schedule these applications to different servers. It is expected that the performance of the system would increase if the scheduling method does not waste any resource. However, the performance of system may decrease if many queries were scheduled to the same servers, or some applications spend more time in allocated servers. Authors in (Arzuaga & Kaeli, 2010; Plakunov & Kostenko, 2014; Hieu et al., 2014) considered the above limitations and balance server loads in the data center. Distinct loading metrics have been used to analyze the loading of each server node for rearrangement in the system. For example, authors in (Arzuaga & Kaeli, 2010) calculated each physical server load and dynamically move the assigned tasks from high loaded servers to lower loaded ones. Authors in (Plakunov & Kostenko, 2014; Hieu et al., 2014) have a different approach to solve the same balancing problem. Instead of moving tasks inside the data center, initially, the tasks are assigned to servers which utilize the servers by balancing loads.

All of the similar aforementioned approaches can solve unbalanced load problem in the data center, but they neglect network-bottlenecked problem, which may be the critical limitation for systems performance in data centers. Although some researchers consider the bandwidth utilization in the data center network by balancing link utilization of network system, some traffic issues still cannot be avoided (Botero, Hesselbach, Fischer, & De Meer, 2012; Mustafa & Nadeem, 2015). For example, the network traffic may still appear if some server nodes in the network are essential or have popular data objects, which are accessed by many users. Therefore, the network-bottleneck will still exist even though system tries to make an appropriate utilization of bandwidth.

In addition, data replication (Zeng et al., 2017) is used to decrease the data loss because of network-bottleneck to increase the data availability rate. In the replication, data popularity is critical because of storage costs and system efficiency. The popular data has higher load because of the excessive access requests (Ananthanarayanan et al., 2011) which can result in data unavailability. However, the less popular data has only a few requests for data access. Therefore, while replicating data, data diversity must be considered with load balancing to utilize the resources efficiently (Keller, Szefer, Rexford, & Lee, 2010). However, unpredictability of future access can lead unbalancing in long-term. Therefore, the system which predicts the future popularity of data can be more efficient. Therefore, the aim of this paper is to increase data availability in multimedia data centers by considering not only the current data popularity but also possibilities for future data popularity in addition to bandwidth limitation and load balancing. Unlike (Arzuaga & Kaeli, 2010), our system can dynamically adapt the structure of data center with future prediction. In other words, the system can pre-manage the data center so that the performance of the system will be improved.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 12: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing