A Novel Reinforcement-Learning-Based Approach to Workflow Scheduling Upon Infrastructure-as-a-Service Clouds

Peng Chen, Yunni Xia, Chun Yu

Source Title: International Journal of Web Services Research (IJWSR) 18(1)

DOI: 10.4018/IJWSR.2021010102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Recently, the cloud computing paradigm has become increasingly popular in large-scale and complex workflow applications. The workflow scheduling problem, which refers to finding the most suitable resource for each task of the workflow to meet user defined quality of service, attracts considerable research attention. Multi-objective optimization algorithms in workflow scheduling have many limitations (e.g., the encoding schemes in most existing heuristic-based scheduling algorithms require prior experts' knowledge), and thus, they can be ineffective when scheduling workflows upon dynamic cloud infrastructures with real time. A novel reinforcement-learning-based algorithm to multi-workflow scheduling over IaaS is proposed. It aims at optimizing make-span and dwell time and is to achieve a unique set of correlated equilibrium solution. The proposed algorithm is evaluated for famous workflow templates and real-world industrial IaaS by simulation and compared to the current state-of-the-art heuristic algorithms. The result shows that the algorithm outperforms compared algorithm.

Article Preview

Top

Introduction

Cloud Computing is a kind of distributed computing, which refers to divide the task into subtasks in agent server. Then, the agent assigns these subtasks to several different heterogeneous servers for processing and analysis and returns the results to the user (Xia et al., 2015a; Xia et al., 2015b). Typically, a cloud works on various tasks within a network, but it is also capable of working on specialized applications. It is designed to solve problems that are too big for a supercomputer while maintaining the flexibility to process numerous smaller problems. Clouds deliver a multiuser infrastructure that accommodates the discontinuous demands of large information processing (Jun et al., 2018; Jun et al., 2017; Yin, Xu, Xu et al, 2017; Yin, Yu, Xu et al, 2017).

Workflows supported by cloud infrastructures and resources is becoming increasingly popular especially for long-running and highly computational-intensive applications (Anubhav et al., 2018; Deng et al., 2015; Jun et al., 2015; Li et al., 2018; Peng et al., 2018; Xu et al., 2018). A workflow is often described by a graph model with multiple jobs organized according to given precedent constraints. Scheduling workflows with distributed cloud resources refers to allocating tasks into appropriate cloud resources, e.g., virtual machines (VMs). It is usually acknowledged such scheduling problem is NP-hard and traditional algorithms can be highly inefficient in terms of time complexity. Instead, heuristic and meta-heuristic algorithms with reduced complexity can yield high-quality solutions of schedules with slight or acceptable optimality loss (Nasonov et al., 2017). Typical cases of heuristic and meta-heuristic methods are variant of bio-inspired evolutionary algorithms. Although such solutions are usually able to yield high-quality near-optimal solutions, they demand a great deal of prior experts’ knowledge in developing novel encoding representations. It is shown that game-theoretic formulations (Duan, 2014; Eiman & Saeed, 2018; Lei & Wang, 2018; Wang et al., 2018) are effective as well when applied to multi-process and multi-constraint scheduling problems (Figure 1).

Figure 1.

The framework of Reinforcement-Learning-Based algorithm

In the recent decade, machine-learning-based algorithms show their power and versatility in dealing with scheduling problems. Considerable research efforts are carried out in leveraging reinforcement learning (RL) and Q-learning-based strategies (Bernd et al., 2018; Cui et al., 2016; Wu et al., 2018; Yi, 2018) for the identification of high-quality and near-optimal solutions. However, existing works in this direction mainly aimed at single-objective DAG planning under given quantitative constraints. Although multi-agent reinforcement learning (MARL) formulations are frequently applied to intelligent control, decentralized message routing, distributed access control, assembly line management, and traffic control scenarios, MARL method for cloud workflow scheduling are still rarely seen. Based on the above analysis, we deal with the cloud workflow scheduling problem by developing a formulation of a discrete event and multi-criteria-interaction Markov game model and employing a multi-agent Deep-Q-network (DQN) strategy for the identification of near-optimal solutions. The proposed framework aims at reducing both workflow completion time and dwell time. An agent in the game are processed by a multi-agent reinforcement learning (MARL) scenario and adjusted with data from legacy systems such as heuristics in neural networks. Each DQN agent is assumed to be able to observe other agents’ actions and rewards. Then, it selects its own joint distribution actions as well as environment updates. Workflow scheduling plans are yielded by using a self-learning and self-optimizing scenario. The above novel design brings our proposed framework various advantages: 1) Agents keep being trained for workflows with different types of process constructs and heterogeneous VMs with different resource configurations; and 2) The resultant scheduling solutions are yielded without human intervention or prior expert’s knowledge. We perform extensive simulative tests with various workflow templates. It is observed that through simulative results that our proposed approach beats its peers in terms of both workflow turnaround time and dwell time optimization.

Complete Article List

Search this Journal:

Reset

Volume 21: 1 Issue (2024)

Volume 20: 1 Issue (2023)

Volume 19: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 18: 4 Issues (2021)

Volume 17: 4 Issues (2020)

Volume 16: 4 Issues (2019)

Volume 15: 4 Issues (2018)

Volume 14: 4 Issues (2017)

Volume 13: 4 Issues (2016)

Volume 12: 4 Issues (2015)

Volume 11: 4 Issues (2014)

Volume 10: 4 Issues (2013)

Volume 9: 4 Issues (2012)

Volume 8: 4 Issues (2011)

Volume 7: 4 Issues (2010)

Volume 6: 4 Issues (2009)

Volume 5: 4 Issues (2008)

Volume 4: 4 Issues (2007)

Volume 3: 4 Issues (2006)

Volume 2: 4 Issues (2005)

Volume 1: 4 Issues (2004)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

A Novel Reinforcement-Learning-Based Approach to Workflow Scheduling Upon Infrastructure-as-a-Service Clouds

Abstract

Introduction

Complete Article List