Article Preview
Top1. Introduction
Scientific and business analysis applications consist of a large size of data. To shrink the overall processing time, data is divided and processed in parallel. Intermitted data are transferred into multiple steps that can be managed and scheduled through process-like models, e.g., service compositions (Cao et al., 2017; Fu et al., 2018; Gao et al., 2018; Sun et al., 2018; Xia et al., 2012, 2013; Zheng et al., 2017) and workflows. Workflows are recently admired for computation intensive large-scale scientific applications and orchestrating data, e.g., molecular biology and high-energy physics. Scientific workflows aim to integrate data and computation steps into organized operations that perform semi-automatic computational tasks for scientific applications. They generally offer graphical interfaces to integrate different techniques with effective methods to use them; thereby enhance the working efficiency of scientists. They are typically represented as directed graphs, i.e., directed acyclic graphs (DAGs), in which the nodes represent separate computing components and the edges represent the communication component to which data and results is transmitted.
Recently, cloud computing systems and platforms are widely-accepted as a promising supporting infrastructure for large scale scientific applications. Cloud computing system provision virtual and physical resources to combination of single or more groups of users. The resource owners decide when and to whom they should allot the specific resource (Deng et al., 2017; Wu et al., 2017; Xia et al., 2015; Xia et al., 2015a, 2015b).
In this way, collaboration can combine the cloud resources to give super-computer computational power to users for their large-scale scientific applications. This model permits tenant or end users to secure and release required resources through a pay-as-you-go manner. The scientific applications can practice to elastically scale resource pool up or down at run time. The cloud management only assigns the required or computational resources which provide the maximum utilization rate to reduce operating costs. Scientific workflows are generally scheduled on cloud through the following steps: 1) to run scientific tasks, a bag of physical resources is selected from the resource pool; 2) a schedule is generated and mapping is performed on the corresponding task resource. IaaS clouds provide resources to users in the form of virtual machines (VM) instances deployed at the provider’s data center.
Recently, the scientific workflow-oriented cloud scheduling problem attracts enormous research attentions (Li et al., 2018; Peng et al., 2018). Since the multi-constraint-multi-objective workflow scheduling problem is well-acknowledged to be NP-hard, it is extremely time-consuming to find optimal solution through traversal-based algorithms. Existing works in this direction fall into two major categories, namely the best–effort scheduling methods and the QoS-constrained scheduling ones (Yu et al., 2008). The best–effort scheduling approaches aim at minimizing the workflow execution time while ignoring other objectives, e.g., cost and reliability. The QoS-constraint methods, instead, is capable of handling multiple quantitative objectives and constraints.
Figure 1. Cloud computing environment