An Efficient MapReduce Computing Model for Imprecise Applications

An Efficient MapReduce Computing Model for Imprecise Applications

Changjian Wang (College of Computer, National University of Defense Technology, Changsha, China), Yuxing Peng (College of Computer, National University of Defense Technology, Changsha, China), Mingxing Tang (College of Computer, National University of Defense Technology, Changsha, China), Dongsheng Li (College of Computer, National University of Defense Technology, Changsha, China), Shanshan Li (College of Computer, National University of Defense Technology, Changsha, China) and Pengfei You (College of Computer, National University of Defense Technology, Changsha, China)
Copyright: © 2016 |Pages: 18
DOI: 10.4018/IJWSR.2016070103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Optimizing the Map process is important for the improvement of the MapReduce performance. Many efforts have been devoted into the problem to design more efficient scheduling strategies. However, there exists a kind of MapReduce applications, named imprecise applications, where the imprecise results based on part of map tasks can satisfy the requirements of imprecise applications and thus the job processes can be completed when enough map tasks are processed. According to the feature of imprecise applications, the authors propose an improved MapReduce model, named MapCheckReduce, which can terminate the map process when the requirements of an imprecise application is satisfied. Compared to MapReduce, a Check mechanism and a set of extended programming interfaces are added to MapCheckReduce. The Check mechanism receives and analyzes messages submitted by completed map tasks and then determines whether to terminate the map phase according to the analysis results. The programming interfaces are used by the programmers to define the termination conditions of the map process. A data-prefetching mechanism is designed and implemented in MapCheckReduce which can improve the performance of MapCheckReduce effectively. The MapCheckReduce prototype has been implemented and experiment results verify the feasibility and effectiveness of MapCheckReduce.
Article Preview

1. Introduction

MapReduce (Jeffrey and Sanjay 2004) is a kind of commonly used computing framework in cloud computing at present. MapReduce contains two main phases: map and reduce. The reduce tasks start processing data only when all the map tasks are completed because the input data of a reduce task may come from all the map tasks.

Improving the MapReduce performance is an important topic for researchers and many efforts have been devoted to it. Some works focus on task scheduling policies (Ali, Matei, et al. 2011) (Kay, Patrick, et al. 2013) (Aysan and Douglas 2011) (Joel, Deepak, et al. 2010) (Matei, Dhruba, et al. 2009) (Matei, Dhruba, et al. 2010). Some other works aim at the stragglers in MapReduce which are some slow tasks with long runtime significantly far behind most of the tasks of the same job (Ganesh, Ali, et al. 2013) (Ganesh, Michael, et al. 2014) (Ganesh, Srikanth, et al. 2010) (YongChul, Magdalena, et al. 2012) (Matei, Andy, et al. 2008). Some other technologies are also developed for the improvement of the MapReduce performance, such as intermediate data caching (Yaxiong, Jie, et al. 2014) and power management (Nan, Xue, et al. 2014), and so on.

In these research, it is still preserved that all the tasks should be processed in MapReduce. However, there exists a kind of special applications in MapReduce jobs which permit the imprecise results based on part of the input data. When enough map tasks are completed and the map outputs reach a certain size, it will bring little influence to the final result accuracy of these jobs to complete more map tasks. For example, the word frequency statistics and the hot-word detection for Internet public sentiment, both of them need to analyze vast numbers of text files. When enough map outputs have been generated, the statistical results will tend to be stable. On the other hand, the imprecise results based on part of map tasks are also able to meet the users’ requirements. These MapReduce applications can be named Imprecise Applications and we can improve the MapReduce performance in imprecise applications through terminating the map processes when enough map tasks are completed.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing