Article Preview
Top1. Introduction
MapReduce (Jeffrey and Sanjay 2004) is a kind of commonly used computing framework in cloud computing at present. MapReduce contains two main phases: map and reduce. The reduce tasks start processing data only when all the map tasks are completed because the input data of a reduce task may come from all the map tasks.
Improving the MapReduce performance is an important topic for researchers and many efforts have been devoted to it. Some works focus on task scheduling policies (Ali, Matei, et al. 2011) (Kay, Patrick, et al. 2013) (Aysan and Douglas 2011) (Joel, Deepak, et al. 2010) (Matei, Dhruba, et al. 2009) (Matei, Dhruba, et al. 2010). Some other works aim at the stragglers in MapReduce which are some slow tasks with long runtime significantly far behind most of the tasks of the same job (Ganesh, Ali, et al. 2013) (Ganesh, Michael, et al. 2014) (Ganesh, Srikanth, et al. 2010) (YongChul, Magdalena, et al. 2012) (Matei, Andy, et al. 2008). Some other technologies are also developed for the improvement of the MapReduce performance, such as intermediate data caching (Yaxiong, Jie, et al. 2014) and power management (Nan, Xue, et al. 2014), and so on.
In these research, it is still preserved that all the tasks should be processed in MapReduce. However, there exists a kind of special applications in MapReduce jobs which permit the imprecise results based on part of the input data. When enough map tasks are completed and the map outputs reach a certain size, it will bring little influence to the final result accuracy of these jobs to complete more map tasks. For example, the word frequency statistics and the hot-word detection for Internet public sentiment, both of them need to analyze vast numbers of text files. When enough map outputs have been generated, the statistical results will tend to be stable. On the other hand, the imprecise results based on part of map tasks are also able to meet the users’ requirements. These MapReduce applications can be named Imprecise Applications and we can improve the MapReduce performance in imprecise applications through terminating the map processes when enough map tasks are completed.