Article Preview
TopIntroduction
A spatial database is a database system that describes, stores, and processes spatial data and the associated attribute data in which a relational database management system (RDBMS) is used to regulate spatial data, mainly solving the data interface problem between the spatial data stored in the relational database and the application program—that is, the spatial database engine (SDE) (Sveen, 2019; Baharin, & Akunne, 2020). More precisely, spatial database technology is used to address the accessing issue of the geometric attributes of the spatial data objects in the relational database (Breunig et al., 2020; Zhang et al., 2022). In addition, the spatial database can also effectively handle the integrating, querying, and managing of complex spatial information for huge volumes of data. Compared with traditional databases, spatial databases have wider application prospects and are the core components of a geographic information system (GIS) (Choi et al., 2020; Xia et al., 2022).
With the continuous development of location-based services (LBS), spatial databases are of great significance in the fields of traffic network systems, GIS, and various decision support systems (Pant et al., 2018). The query technology of spatial data is the utmost important operation for a spatial database. Spatial data query technology refers to the process of searching for and returning of spatial objects that meet certain query constraints in a spatial database. Commonly used spatial database query operations include Exact Match Query (Jiang et al., 2019), Point Query (Wang et al., 2019), Region Query (Wan et al., 2019), and Nearest Neighbor Query (Chen et al., 2019).
The problem of nearest neighbor query is a hotspot in today’s database technology, and the query results are usually spatial data or datasets that meet the query requirements. Nearest neighbor query is the basis of the spatial database query field. Because of the continuous change of query requirements, many variants have been produced, such as k-nearest neighbor (kNN), directional nearest, clustered nearest, group nearest, constrained nearest, and privacy-preserving nearest neighbors (Wang et al., 2019). The kNN and reverse nearest neighbor (RkNN) algorithms are considered to be the most fundamental and widely used query types. The kNN algorithm can be provided as an LBS service alone, such as finding the nearest restaurant or hotel, or it can be used as a basic technology to provide support for other queries, such as RkNN and search window query (Xu et al., 2021). RkNN has a good application when it comes to the “influence range” and “influence degree” of an object on other objects, such as the location problem when building new infrastructure, such as shopping malls and hospitals, and the current popular concept of targeted advertising.
Most of the current parallel RkNN query algorithms are on the basis of the Map-Reduce framework. Ji et al. (2015) introduced a distributed RkNN query processing algorithm on the basis of the inverted grid index. García et al. (2019) conducted distributed RkNN query research on Spatial Hadoop, the spatial expansion framework of Hadoop, and proposed an RkNN query algorithm MRSLICE based on Spatial Hadoop. MapReduce framework along with open-sourced applications in Hadoop are breakthroughs in the efficient processing of large-scale datasets in which data is processed through a distributed system, and program execution will not be affected by node failures. However, researchers have found in experiments that the performance of multicore devices is limited by the single node implementation, which can easily lead to too tight coupling and low scalability. The limitation of MapReduce in the Hadoop platform will cause the start-up overhead of each round of operation in the iterative calculation process to be too large, and the overhead of disk I/O will also reduce the execution efficiency (Grolinger et al., 2014). In contrast, the Spark framework overcomes the aforementioned problems, and it accelerates the processing of distributed computing through memory computing strategies. This framework allows users to store data in memory for repeated queries, making it more practical for online, iterative, and dataflow algorithms (Meng et al., 2016).