Adaptive and Convex Optimization-Inspired Workflow Scheduling for Cloud Environment

Scheduling large-scale and resource-intensive workflows in cloud infrastructure is one of the main challenges for cloud service providers (CSPs). Cloud infrastructure is more efficient when virtual machines and other resources work up to their full potential. The main factor that influences the quality of cloud services is the distribution of workflow on virtual machines (VMs). Scheduling tasks to VMs depends on the type of workflow and mechanism of resource allocation. Scientific workflows include large-scale data transfer and consume intensive resources of cloud infrastructures. Therefore, scheduling of tasks from scientific workflows on VMs requires efficient and optimized workflow scheduling techniques. This paper proposes an optimised workflow scheduling approach that aims to improve the utilization of cloud resources without increasing execution time and execution cost.


INTROdUCTION
Cloud is a challenging and highly demanding system where services are metered, reliable, and can be accessed on-demand Yang and Chen (2010), Zhang et al. (2010), Sandhu and Lakhwani (2022), Sorkhoh et al (2020).Workflows Zhao et al. (2011) have been used to model scientific applications Barker and Van Hemert (2007), Pietri et al. (2013), Gil et al. (2007).Scientific workflows like 2 MONTAGE, LIGO, SIPHT, GENOME, etc. have millions of tasks.Zhao et al. (2011), Vöckler et al. (2011), Kouatli, I (2020).These tasks must be mapped to cloud resources as they become feasible to provide efficient scheduling with the least amount of resource consumption.There are varieties of optimization approaches that may be used to find optimal scheduling solutions (Casavant and Kuhl, 1988).To address different optimization problems numerous general-purpose meta-heuristic algorithms are available (Talbi, 2009).These algorithms provide scheduling and optimization solutions that are close to optimum (Kumar and Sivakumar, 2022;Bisht and Vampugani, 2022;Alakbarov, 2022), Farhat et al (2020) .
Meta-heuristic approaches are generally more computationally intensive than heuristic approaches and take longer to run; however, they also tend to find more desirable schedules as they explore different solutions using a guided search.In cloud systems, using meta-heuristics approach to solve the workflow scheduling problem involves many challenges such as: modeling a theoretically unbound number of resources, defining operations to avoid exploring invalid solutions (e.g., data dependency violations) to facilitate convergence, and pruning the search space by using heuristics based on the cloud resource model (Negi et al., 2013;Rajput et al., 2022;Kumar et al., 2022).
Figure 1 depicts methods of task scheduling where meta-heuristic approach is stronger and is adopted in proposed study.In scientific workflows and their mapping techniques, researchers have targeted various parameters to attain better outcomes.Figure 2 highlights these important scheduling parameters.Time From the age of grid computing, time or makespan is the primary objective of most scheduling techniques.As far as the Cloud Computing environment is concerned, execution time plays a significant role since the cloud provider charges its customers based on execution time.In workflow applications, the execution time is defined as the time taken to complete all tasks in a workflow.Hence reducing the execution time of workflow applications become a crucial factor.Cost Cloud providers often charge clients for leasing infrastructure, which includes costs for resource consumption, data transport, and cloud storage, among other things.The computation cost plays a dominant role in workflow scheduling, it is necessary to minimize these costs for the effective usage of the cloud platform.Thus, an efficient scheduling algorithm that considers these costs during the resource provisioning is necessary for executing the workflow applications in the heterogeneous cloud environment.
Energy Due to the rising execution of workflows in various fields, the energy consumption of data centers has gradually increased.Hence, energy conservation in cloud data centers has become a matter of concern.High energy consumption incurs high operational and maintenance costs.Due to an inefficient scheduling approach, a data center with a low workload may become a high energy consumption center.In addition, if the resources on the servers are over-utilized, the cloud system is classified as inefficient in terms of energy usage.Thus an effective scheduling mechanism is required to address these issues that will help to attain a green-environment by reducing unnecessary power consumption (Sharma and Sajid, 2021;Gupta and Gupta et al., 2018;Bhushan and Gupta, 2017, Gaurav et al. 2022, Zhang et al. 2017, Shahzad et al. 2022).
Resource utilization Resource utilization determines the efficient usage of resources.Leased Resources should be utilized efficiently to avoid unnecessary money expenditure as providers also charge for the unutilized slots.Improving resource utilization has considerable benefits for its various users in the form of cost and also for its providers in terms of profit and energy consumption.Hence improving resource utilization becomes a significant factor in scheduling.This research work focuses on the significant scheduling criteria related to economic factors such as Time, Cost, and environmental factors, such as consumption of energy and Utilization of the resources for computing Workflow applications in the cloud environment.Security Data privacy and security need to be addressed while adopting cloud computing as the workflows may contain confidential information which scientists may not wish to reveal (Alkhanak et al., 2016, Gaurav et al. 2022, Mehla R & Psannis, 2022, Kumar et al., 2022, Gupta et al., 2009, Quamara et al., 2019).
Cloud Computing is a world where internet-based computing exists.So, all services, whether they are storage, apps, or servers, are delivered to users' computers through the internet.For the success of the scientific world, the cloud has taken a well-built stride toward the facility of virtualization.There have been a lot of advancements made in this area.The primary objective of various workflow scheduling algorithms is to minimize the cost of execution.A majority of algorithms take other metrics, as outlined in Figure 3.
While Scheduling occurs, the important objectives are about what to minimize and what to maximize in the whole process (Vecchiola et al., 2009, Shaw and Singh, 2014, Singh, A., and Kumar, R 2021, Bhardwaj et al. 2022, Liang, Y., et al 2022).As depicted in Figure 3, make-span or total execution time (TET), total execution cost(TEC), total energy consumption, and response time should be minimized.On the other hand, the utilization of cloud resources should be maximized.

Major Contribution
The proposed method TBW contributes toward mapping and migration of tasks under deadline constraints.It also minimizes time and cost parameters.TBW method achieves better results than other optimization algorithms in the cloud VM scheduling.The objectives of this research are to optimize TEC, TET, and response time while migrating tasks from less utilized VMs to other VMs.Through simulation, it was observed that the proposed TBW algorithm outperforms the other optimization approaches.

Paper Organization
The rest of the article is structured as follows.Section 2 presents some of the important literature work related to task ranking and resource optimization.Then the architecture and framework of the proposed methodology are presented in section 3. Experimental simulation, results, and performance analysis are represented in section 4. Section 5 provides the comparative analysis of the outcome with other state of art literature.A few significant directions for future study are outlined in Section 6.

RELATEd WORK
Scheduling based problems need some meta-heuristic algorithms (Rodriguez and Buyya, 2014, Arabnejad and Barbosa, 2014, Ghose et al., 2017, Jiang et al., 2017, Bisht et al. 2022, Onyebuchi, et  ).Such algorithms work on parameters like time, cost, response time, memory consumption, energy consumption, etc.In Karaboga and Basturk (2007), a hybrid tabu and ABC scheduling approach for cloud systems have been defined.The target of the study is to balance the load on VMs in the cloud.Further, based on speedup, utilization, total time, efficiency, energy consumption, and makespan parameters a comparison with existing scheduling methods has been done in this paper.Byun et al. (2011) created an algorithm that calculates the ideal amount of resources that should be leased to keep the cost of a workflow's execution to a minimum.The algorithm is made to run online and also creates a job for resource mapping.Every billing period (i.e., every hour), the schedule and resources are changed based on the state of the active VMs and tasks.In GA-PSO, optimal workflow scheduling results have been generated with load balancing; the authors used a hybrid approach for workflow tasks scheduling.Also, the researchers have done comparisons with existing methods like GA, PSO, WSGA, and HSGA (Al-Maamari and Omara, 2015).
In WOA Mirjalili and Lewis (2016), the work is on three operators.Searching for prey, trap the prey and then bubble-net foraging behavior of whales is the main dedication of the authors.Also, a comparison has been done in WOA on 26 mathematical benchmark functions.The optimizationbased results have been compared with existing optimization algorithms like HGA, PSO, PSOPC, SOS, HPSO, MBA for different design problems.WOA has provided better results.
Sossa ( 2016) suggested a PSO-based approach to reduce the execution cost of a single process while balancing task load on available resources.While cost reduction is a top priority in clouds, load balancing is a more sensible goal in a non-elastic setting like a cluster or a grid.The workflow execution time is not addressed in the scheduling objectives; hence this number might be quite high as a result of the cost-cutting philosophy.The authors assume a certain number of VMs are available and ignore the cloud's flexibility.Because of this, the proposed solution is comparable to those used for grids, where the generated schedule is a mapping between tasks and resources rather than a more detailed schedule indicating the quantity and type of resources that need to be leased, when they should be acquired and released, and in what order the tasks should be carried out on them (Elrotub et al., 2021;Singh and Kumar, 2021;Xu et al.,2021).
In paper Liu et al. (2017), the authors have worked on a deadline constraint-based workflow scheduling process.They accepted four scientific workflows as Cybershake, Montage, Lego, and Inspiral.Within the user's defined deadline constraints, Both TET and TEC were evaluated.Apart from this, focus on crossover and mutation probability was also a prior concern.Performance is evaluated by a task ranking system.DAG is used to represent workflows.A penalty function, as well as a penalty rule in CGA was proposed which is CGA2 and it works without any parameter.Also, it has worked to overcome prematurity.
Authors in paper Reddy and Kumar (2017) have discussed the whale optimization-based algorithm which mimics the humpback whales.The authors shared that these whales hunt their food which is many small fishes close to the surface.For this hunting, whales swim around them within a shrinking circle.Dubey et al. (2018) have provided the proposal for task ranking.In this scheduling, the rank of tasks and the assignment of processors are two important factors.Firstly, it creates a DAG and then it works on tasks in order.The proposed algorithm uses a modified HEFT algorithm that has reduced makespan and provides better resource utilization.
Alkhanak and Lee (2018) proposed a cost optimization approach for scientific workflow scheduling in cloud computing.The proposed method employs the four meta-heuristic algorithms and, these algorithms work on the VMs population of a cloud system.It helps in reducing the cost and time of the service providers.The execution cost and time are reduced as compared to baseline approaches.Choudhary et al. (2018) introduced a gravitational search algorithm for workflow scheduling in the cloud environment.The optimizations in workflow reduce the cost and makespan.In this 6 process, two algorithms are hybridized GSA and HEFT for workflow scheduling.The performance evaluation is done on the basis of two metrics that are monetary cost ratio and schedule length ratio.

PROPOSEd METHOdOLOGy
According to (Rodriguez and Buyya, 2014, Arabnejad and Barbosa, 2014, Vecchiola et al., 2009) in workflow scheduling, mapping of task on VMs should not be static.The proposed methodology is more effective in mapping the task dynamically.It works on two objectives.First, it targets effective task distribution on cloud resources, and second, it targets optimal scheduling for better performance.Figure 4 presents the architectural framework of proposed workflow scheduling and optimization technique.It consists of two phases.Workflow task ranking, using Distributed HEFT technique is performed in phase-1 and, resource optimization using proposed TBW method is performed in phase 2. Collection of the scientific workflow tasks are the input for phase-1.Ranked-tasks mapped on cloud VMs is the outcome of phase-1 which further optimized dynamically in phase-2 to improve TET and TEC for input workflow.
Methodology steps for complete framework are as follows: 1. Input the workflows.

Input Workflow
Total five workflows are accepted as input named: 1. MONTAGE 2. CYBERSHAKE 3. SIPHT 4. LIGO 5. EPIGENOMICS Graphical representation of these workflow is shown in Figure 5.These are important scientific applications and uses datasets at large scale.

Parsing and Finding Critical Path
Parsing is the next step.It is a step of analyzing input.In the proposed cloud workflow framework, parsing occurs at the initial stage.It shows tasks with dependencies.Following the critical path, all tasks of the input workflow were collected and analyzed.After this, phase 1 of the system begins.

Phase 1: Task Ranking
This phase progresses to find out the effective rank value of each task of input workflow.Distributed HEFT task ranking algorithm is proposed in this phase and various steps of this technique are represented in Algorithm 1.It works on three heuristic parameters budget, time, and deadline.The distributed HEFT ranking method finds out the correlation between these parameters and then assigns the top rank to the task having a highly distributed score among all tasks.A correlation's value might fall between -1 and +1.It is a relationship between two parameters that conveys how closely related two parameters are to one another.The task with the highest correlation receives the highest rank, while the task with the lowest correlation receives the lowest rank.Now, utilising the distributed-HEFT Rank, these rated jobs are scheduled on cloud resources.

Phase 2: Cloud Optimization
This phase further uses three optimization approaches to make the system more efficient.In order to reduce energy usage, it first executes task migration on any underused machines that may be present in the system.An innovative optimization strategy based on Tabu search, Bayesian, and whale optimization methodologies has been applied throughout this procedure.Tabu optimization helps to find out underutilized resources.Afterward, Bayesian optimization approach helps to provide a combination of VMs which are best suitable for task migration.While minimising time, cost, and reaction time, Whale optimization facilitates in the transfer of jobs from underutilized machine to others host.The major goal of this study also includes analysis and assessment of performance.The Makespan, Cost, Energy Consumption, and Response-time of Workflow Execution are determined after conducting VM migration in an efficient way. Figure 6 represents complete process of proposed TBW optimization technique.Moreover, an explanation of the proposed work with the algorithm and a flow chart of the methodology in detail is explained in this section.The concept of scheduling for optimization is implemented on the cloud system.Cloud-based virtual machines (VMs) are the most advantageous elements of this phase (Nigam et al., 2022;Kumar et al., 2022;Hemrajani et al., 2022, Stergiou et al. 2021, Gupta et al. 2021).
The Cloud-based system can be optimized utilizing Tabu, Bayesian, and Whale optimization approaches, as depicted in figure 6.It takes input tasks that have been ranked and assigned to cloud VMs using the distributed HEFT ranking algorithm.TBW algorithm is denoted as an optimization 9 algorithm as it works step by step to find an optimum solution based on objective functions.All these steps are part of the schedule to ensure that neither any Virtual Machine is left idle for neither long periods of time nor any Virtual Machine is overloaded with work.A scheduler decides which task/ job should go to which machine.Looking for underused VMs and searching for over-utilized VMs are the two primary processes done throughout the setup.TBW algorithm is capable of avoiding local optima.So, it is suitable for most practical applications.Apart from this, to solve different constrained or unconstrained optimization problems, no alteration is required to perform in the algorithm.In the proposed ideology, three algorithms are used: -Tabu Search to find the underutilized resources, Bayesian Optimization to combine the best suitable VMs for task migration, and Whale Optimization to optimized the task migration.
• Tabu Search: Input to Tabu search method is the list of VMs on which ranked-tasks are mapped.
Tabu search algorithm is applied to find neighbours of current VM.tasks can be shifted.Algorithm 3 presents various steps incurred in the determination of the best combinations of VMs for mapping the task in real time.• Use of Whale Optimization (WO): Whales have the ability to locate their target before engulfing them.Whale optimization algorithm is inspired by humpback whales' bubble-net feeding technique, in which they release a stream of bubbles in decreasing circle and spiral patterns around their pray.The whale positions are chosen at random and further analysed to see which the best is now.After then, the other whales also change their locations in accordance with the current optimal solution as described in Algorithm 4.
In this research work, optimization is used for the migration of tasks from underutilized machines to other ones but ones, without any increase in time, cost and response time.Whale Optimization takes input from Bayesian Optimization.Based on its objective functions, it chooses the best combination of VM and shifts tasks on VMs efficiently without increasing TEC and TET.Overall results are better than GA-PSO for scientific workflows.In the whole process, management of time and cost are the main factors.
Equations (1 to 5) expressed below are effectively elaborating TET and TEC used throughout the process: where: Algorithm-5 named TBW optimization algorithm has used algorithm-2 Tabu search to begin the process.It uses a tabu list which stores list of those VMs which are not effectively utilized.Then on the basis of tabu list, statement 6 in TBW algorithm is used to call Bayesian optimization algorithm which is algorithm 3. Bayesian optimization provides best combination of resources, as the target of this research is to migrate tasks from underutilized VMs to other VMs.Using it, algorithm 4 has started from statement 8 which is use of whale optimization for actual migrate of tasks of underutilized machines to other VMs.Also the complete process has analyzed the scheduling on the basis of time, cost, and response time and energy consumption parameters.

SIMULATION TOOL ANd EXPERIMENTAL SETUP
Implementation of this research work was carried out using CloudSim simulator.This section describes how the Simulation tool works, how it is setup, and what parameters it uses for execution.et al. (2011) CloudSim is a platform for simulating the cloud computing environment.It is based on object-oriented Java programming language.It was developed by CLOUDS Laboratory at department of Computer Science and Engineering, University of Melbourne, Australia.Using CloudSim it is possible to simulate virtualization within data centers which allows for better experimentation and evaluation of cloud computing algorithms, meta heuristics, and protocols.Moreover, using CloudSim, users can simulate large-scale computing infrastructures and services, which include resource allocation, data centre, broker, allocation policies, scheduling etc.

Experimental Setup
The algorithms proposed in this research work are implemented in object-oriented Java Programming using Eclipse IDE and deployed to CloudSim toolkit version 4.0.The experiments were performed on a 64-bit operating system with a CPU (2.60GHz) and RAM (8 GB).A data centre with an x86 architecture and Linux OS was created.The characteristics of the host machine are set to CPU Capacity 1000 MIPS, RAM 4096 MB, and Disk space 2000000 MB.For each host, bandwidth is divided into two groups.Group-1 is a set of three host which takes values like {10,000, 15,000, and 20,000} and Group2 is a set of five host which takes values like {10000, 15000, 20000, 25000, 30000}.For VMs, number of CPU set to one and amount of bandwidth generated randomly between (5000, 10000), (5000, 15000) and (500, 20000).

RESULTS ANd dISCUSSION
The validation is performed via simulation-based experiments using cloudsim.The efficiency of TBW has been analysed by scheduling different workflow tasks on various VMs using TBW algorithm.The whole agenda of proposed study and its implementation is to reduce total cost and time consumed in whole process of mappings input tasks to VMs.Using cloud simulator, TBW (Tabu-Bayesian-Whale) scheduling algorithm has provided us better results as compared to existing scheduling and optimization algorithms.Table 1 shows   for TET and RT parameter, Cost (in rupees) for TEC parameter, and Energy Consumption (in KWh) for EC parameter.

CONCLUSION
Scheduling complex workflows is a challenge in the field of Cloud-Computing Environment.There is a need to optimize the Cloud-based resources while scheduling the workflow.The scheduling of scientific workflows needs to be managed carefully in the virtualized infrastructure to optimize execution time, cost, and energy consumption.In this paper, a comprehensive scientific workflow scheduling framework named TBW is designed and implemented in a simulated environment to optimize these parameters.The proposed framework indulges different time and cost-related attributes in order to optimize overall execution time and cost for scheduling scientific workflow.Moreover, the phased architecture of the framework is designed to perform various predefined tasks in a sequential and synchronized manner.In order to achieve the overall effectiveness, the combination of three optimization approaches named Tabu Search, Bayesian Optimization, and Whale Optimization techniques is used.In this research work, input workflow tasks are not randomly mapped to virtual machines but are first ranked using the distributed HEFT method and then scheduled on cloud-based machines.Afterwards, the optimization process starts which executes the TBW method to control time and cost parameters.We have incorporated a better optimization approach named TBW optimizer scheduler for optimizing the Total execution time, Total execution cost, Response time and Energy consumption parameters.The proposed approach has been implemented for MONTAGE, CYBERSHAKE, LIGO, GENOME, and SIPHT scientific workflows.The evaluated results have provided better results than the Whale optimization, GAPSO, GA and PSO approach for optimization.The proposed system is more effective as it has used effective optimization in a better way.

Figure
Figure 2. Scheduling parameters

Figure
Figure 3. Objectives behind scheduling a workflow

Figure
Figure 5. Scientific workflows Algorithm 1: Distributed HEFT task ranking 1: Begin 2: Select initialization heuristics: 3: B as Budget, T as Time, D as Deadline 4: Gather T, B, and D of each task F 5: Find Correlations as: 6: C TB = correlation between T and B 7: C TD = correlation between T and D 8: C BD = correlation between B and D 9: For every task at each level compute: 10: Distributed score = ∑ the highest rank to a task with a maximum correlation score 12: Schedule the Ranked task to VMs

Figure 6 .
Figure 6.TBW optimization technique in cloud environment T t : Total Time T R : Receiving or passing time of task T P : Processing Time of task T W : Waiting Time of task Actual Cost = Under Deadline Total Cost + Deadline crossed Task Cost (Process Cost * task memory)/involved VMs (5)

Figure 7 .
Figure 7. Simulation results of TET, TEC, RT, and EC parameters of scheduling LIGO workflow for different optimization algorithms

Figure 11 .
Figure 11.Simulation results of TET, TEC, RT, and EC parameters of scheduling MONTAGE workflow for different optimization algorithm 2. Parse the tasks.3. Ranking of tasks.4. Provides the virtual machines according to ranking based paths. 5. Initialize the optimization using tabu search and Bayesian optimization.6. Use whale optimization and update the status of the fitness function.7. Check the output is optimized or not.If yes then analyze otherwise again initialize.8. Analyze the total resource utilization.