Given a point r , a dataset S in space D and an integer k , the k nearest neighbors of r from S , denoted as k NN( r , s ), is a set of k point from S that ? p ? k NN( r , S ), ? s ? S - k NN( r , S ), | p , r |=| s , r |.
Published in Chapter:
Parallel kNN Queries for Big Data Based on Voronoi Diagram Using MapReduce
Wei Yan (Liaoning University, China)
Copyright: © 2016
|Pages: 23
DOI: 10.4018/978-1-4666-8767-7.ch014
Abstract
In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.