Article Preview
Top1. Introduction
The escalation of computing services has been enhanced recently. Its attractiveness mainly stems from the release of IT resources, such as the transformation of capital IT expenditure into economical one. It has the potential for reducing costs through economies of scale. The unique advantages of computing services facilitation is the time and cost constrained task execution, because of large number of jobs bounded with an assured budget for execution within the computing time (Chen, 2012). As an unprecedented volume of data is being dealt by IT industries every day, the open source implementation of MapReduce programming model in Hadoop turns out to be the standard technique for analyzing these peta scale data in a cost-efficient way. A large number of users would be served by a simpler version of the MapReduce framework in data-bound workloads (Jonas, 2017). Every day averagely 25000 MapReduce jobs are hosted on Facebook Hadoop clusters containing 3000 machines. These vast amounts of data create new challenges and opportunities which escort to discoveries and extraordinary new knowledge in many application domains ranging from science and engineering to business. The challenges are not bounded to the size of data only, but also to the time and cost constraints (Rani & Vinaya Babu, 2015).
It has been affirmed that the execution of gathered massive, non-uniform real-world data through grid computing and cloud computing is becoming more obscure (Hajikano et al., 2016). This could be overcome by efficient scheduling of task , because, the optimal scheduling schedules the resources efficiently to get quick response time for real-time application. Therefore, a cost optimized task scheduling for data intensive jobs is presented here. The motivation of this work is execution of data intensive jobs within time and cost constraints. The optimal selection of machines yields a significant enhancement on performance and maximum resource utilization (Abdelaziz, 2018). But , the scheduling of heterogeneous computing resources depends on the following parameters of resource capacity, resource availability, workload size, and resource utilization cost. Service level agreement (SLA) negotiation also moderates the scheduling and utilization of resources (Cheng, 2015). Therefore, the above parameters have to be considered, while scheduling the tasks on resources to meet out user expectations (Figure 1).
Figure 1.
A framework for Resource Scheduling
The user expectations are strongly entailing the desired quality-of-service (QoS) such as quality of results, the execution time, throughput, economic costs, reliability, trust, etc. Moreover, the timeliness of computation is acquired by allowing users to specify an absolute deadline. In the view of attaining the timeliness of the computation, more patterns are employed to model a generic flow of work. The data parallelism pattern is appropriate to embarrassingly model a parallel computation of data-intensive task. This pattern leads to the concurrent execution of multiple and independent data parallel tasks on heterogeneous computing resources. However, data parallel task scheduling in heterogeneous environments with the aim of satisfying QoS constraints (such as cost & execution time) is a complex issue.
Nowadays, most of the resource has multi-core processor which signifies two or more processing cores are placed on the same chip. Multi-core processor improves overall performance by handling more work in parallel. An efficient model is needed to acquire performance of these multi-core resources. The data parallel processing operation favors the design of many processing elements to handle large amounts of data to often yield high throughput and performance (Blake, 2009).