An Efficient Association Rule Mining-Based Spatial Keyword Index

An Efficient Association Rule Mining-Based Spatial Keyword Index

Lianyin Jia, Haotian Tang, Mengjuan Li, Bingxin Zhao, Shoulin Wei, Haihe Zhou
Copyright: © 2023 |Pages: 19
DOI: 10.4018/IJDWM.316161
Article PDF Download
Open access articles are freely available for download

Abstract

Spatial keyword query has attracted the attention of many researchers. Most of the existing spatial keyword indexes do not consider the differences in keyword distribution, so their efficiencies are not high when data are skewed. To this end, this paper proposes a novel association rule mining based spatial keyword index, ARM-SQ, whose inverted lists are materialized by the frequent item sets mined by association rules; thus, intersections of long lists can be avoided. To prevent excessive space costs caused by materialization, a depth-based materialization strategy is introduced, which maintains a good balance between query and space costs. To select the right frequent item sets for answering a query, the authors further implement a benefit-based greedy frequent item set selection algorithm, BGF-Selection. The experimental results show that this algorithm significantly outperforms the existing algorithms, and its efficiency can be an order of magnitude higher than SFC-Quad.
Article Preview
Top

Introduction

The rapid development of mobile devices has produced many location-based services, and an increasing number of online objects have both spatial and textual properties. For example, social services such as Twitter and Foursquare allow users to send their tweets with location; navigation services such as Google Maps and Baidu Map allow users to search for points of interest nearby. These applications require efficient spatio-textual indexes to support numerous spatial keyword queries.

There are three common spatial keyword queries: Boolean range query (BRQ), Boolean kNN query (BkQ) and the top-k kNN query (TkQ) (Chen et al., 2013). This paper focuses on BRQ, which is fundamental to many spatial textual applications (Salgado et al., 2018; Tampakis et al., 2021; Wang et al., 2021; Wang et al., 2020). Given a collection of spatio-textual data, a spatial keyword query IJDWM.316161.m01 (comprising a spatial region IJDWM.316161.m02 and a set of keywords IJDWM.316161.m03), BRQ aims to find all objects, each of which is in IJDWM.316161.m04 and contains all keywords in IJDWM.316161.m05.

In Figure 1, a user may want to find a restaurant in a certain area (query region in a dashed rectangle) that sells hamburgers and coffee (query keywords) together. He can issue his query to a BRQ server and obtain point 1 as the returned result. Although points 3 and 4 are also in this area, they do not contain all the query keywords. For example, P2 sells hamburgers and coffee together, but they are too far away.

Figure 1.

A BRQ Example

IJDWM.316161.f01

The spatial keyword index is the heart of an efficient spatial keyword query. We can consider it a spatial and textual index combination, either in a space-first or text-first way. Based on the spatial indexes used, existing studies can be R-tree-based, grid-based, and space-filling curve-based. From the textual perspective, we can also classify these studies into inverted index-based and bitmap-based studies. Chen et al. (2013) experimentally evaluated 12 spatial-textual indexes, such as IR-Tree (Wu et al., 2012b), WIBR-Tree (Wu et al., 2012a), and SFC-Quad (Christoforaki et al., 2011), and the results show that SFC-Quad performs the best both in query time and space on BRQ.

The excellent performance of SFC-Quad is because of the simple core structure: an inverted index ordered by space-filling curves. The objects in each inverted list are sorted according to the Z-curve order, so the objects close in the original space are as close as possible in the inverted lists, thus leading to a better query performance.

Although many studies followed the study by Chen et al. (2013) into BRQ, most turn to in-memory or streaming data spatial keyword query (Chen et al., 2017; Mahmood et al., 2018), collaborative spatial keyword query (Zhao et al., 2017), semantic spatial keyword query (Qian et al., 2018; Sun et al., 2017), privacy-preserving spatial keyword query (Cui et al., 2019; Su et al., 2015), and spatial keyword query on road network (Han et al., 2015; Zheng et al., 2016). We can see from recent surveys (Chen et al., 2020; Chen et al., 2021) that SFC-Quad is still the best representative in disk-based BRQ algorithms.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing