In n -dimensional space D , given two points r and s , | r , s | represents the distance between point r and s in space D . In this chapter, the Euclidean distance is used as the distance. where, r [ i ] (resp. s [ i ]) denotes the value of r (resp. s ) along the i th dimension in space D .
Published in Chapter:
Parallel kNN Queries for Big Data Based on Voronoi Diagram Using MapReduce
Wei Yan (Liaoning University, China)
Copyright: © 2016
|Pages: 23
DOI: 10.4018/978-1-4666-8767-7.ch014
Abstract
In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.