An Efficient MapReduce Computing Model for Imprecise Applications

An Efficient MapReduce Computing Model for Imprecise Applications

Changjian Wang, Yuxing Peng, Mingxing Tang, Dongsheng Li, Shanshan Li, Pengfei You
Copyright: © 2016 |Pages: 18
DOI: 10.4018/IJWSR.2016070103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Optimizing the Map process is important for the improvement of the MapReduce performance. Many efforts have been devoted into the problem to design more efficient scheduling strategies. However, there exists a kind of MapReduce applications, named imprecise applications, where the imprecise results based on part of map tasks can satisfy the requirements of imprecise applications and thus the job processes can be completed when enough map tasks are processed. According to the feature of imprecise applications, the authors propose an improved MapReduce model, named MapCheckReduce, which can terminate the map process when the requirements of an imprecise application is satisfied. Compared to MapReduce, a Check mechanism and a set of extended programming interfaces are added to MapCheckReduce. The Check mechanism receives and analyzes messages submitted by completed map tasks and then determines whether to terminate the map phase according to the analysis results. The programming interfaces are used by the programmers to define the termination conditions of the map process. A data-prefetching mechanism is designed and implemented in MapCheckReduce which can improve the performance of MapCheckReduce effectively. The MapCheckReduce prototype has been implemented and experiment results verify the feasibility and effectiveness of MapCheckReduce.
Article Preview
Top

1. Introduction

MapReduce (Jeffrey and Sanjay 2004) is a kind of commonly used computing framework in cloud computing at present. MapReduce contains two main phases: map and reduce. The reduce tasks start processing data only when all the map tasks are completed because the input data of a reduce task may come from all the map tasks.

Improving the MapReduce performance is an important topic for researchers and many efforts have been devoted to it. Some works focus on task scheduling policies (Ali, Matei, et al. 2011) (Kay, Patrick, et al. 2013) (Aysan and Douglas 2011) (Joel, Deepak, et al. 2010) (Matei, Dhruba, et al. 2009) (Matei, Dhruba, et al. 2010). Some other works aim at the stragglers in MapReduce which are some slow tasks with long runtime significantly far behind most of the tasks of the same job (Ganesh, Ali, et al. 2013) (Ganesh, Michael, et al. 2014) (Ganesh, Srikanth, et al. 2010) (YongChul, Magdalena, et al. 2012) (Matei, Andy, et al. 2008). Some other technologies are also developed for the improvement of the MapReduce performance, such as intermediate data caching (Yaxiong, Jie, et al. 2014) and power management (Nan, Xue, et al. 2014), and so on.

In these research, it is still preserved that all the tasks should be processed in MapReduce. However, there exists a kind of special applications in MapReduce jobs which permit the imprecise results based on part of the input data. When enough map tasks are completed and the map outputs reach a certain size, it will bring little influence to the final result accuracy of these jobs to complete more map tasks. For example, the word frequency statistics and the hot-word detection for Internet public sentiment, both of them need to analyze vast numbers of text files. When enough map outputs have been generated, the statistical results will tend to be stable. On the other hand, the imprecise results based on part of map tasks are also able to meet the users’ requirements. These MapReduce applications can be named Imprecise Applications and we can improve the MapReduce performance in imprecise applications through terminating the map processes when enough map tasks are completed.

Complete Article List

Search this Journal:
Reset
Volume 21: 1 Issue (2024)
Volume 20: 1 Issue (2023)
Volume 19: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 18: 4 Issues (2021)
Volume 17: 4 Issues (2020)
Volume 16: 4 Issues (2019)
Volume 15: 4 Issues (2018)
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing