A New Fast Intersection Algorithm for Sorted Lists on GPU

A New Fast Intersection Algorithm for Sorted Lists on GPU

Faiza Manseur, Lougmiri Zekri, Mohamed Senouci
Copyright: © 2022 |Pages: 20
DOI: 10.4018/JITR.298325
Article PDF Download
Open access articles are freely available for download

Abstract

Set intersection algorithms between sorted lists are important in triangles counting, community detection in graph analysis and in search engines where the intersection is computed between queries and inverted indexes. Many researches use GPU techniques for solving this intersection problem. The majority of these techniques focus on improving the level of parallelism by reducing redundant comparisons and distributing the workload among GPU threads. In this paper, we propose the GPU Test with Jumps (GTWJ) algorithm to compute the intersection between sorted lists using a new data structure. The idea of GTWJ is to group the data, of each sorted list, into a set of sequences. A sequence is identified by a key and is handled by a thread. Intersection is computed between sequences with the same key. This key allows skipping data packets in parallel if the keys do not match. A counter is used to avoid useless tests between cells of sequences with different lengths. Experiments on the data used in this filed show that GTWJ is better in terms of execution time and number of tests.
Article Preview
Top

Introduction

The calculation of the intersection between sets is an important operation in numerous application domains, such as information retrieval for text search engines, graph analytics for triangles counting and community detection and database systems. It serves to create indexes, especially inverted ones, by computing for every term, in the documents database, the set of documents in which it appears. Also, it allows search engines to find the responses to queries by computing their intersections with the global inverted index. Since 1971, when the first work has appeared, many techniques have been published in order to accelerate intersection time between sets, especially between sorted ones. At the basis, these techniques use well defined data structures and exploit hard components of machines. Logically, time execution depends on the manner by which the comparisons between the elements of the entries are done. In this context, three approaches are adopted to this end:

  • • Accelerate sequential processing time using data structures that are well adapted to requirements;

  • • Exploit graphics processors (GPUs) and multi-core processors (CPUs) to exploit advantage of the parallelism they offer; or

  • • Exploit new plate-forms or frameworks, like map-reduce, for distributing the workload.

Since current machines are equipped with GPU cards that can easily be integrated into them and since they offer a high-level parallel environment at low cost. We propose a new GTWJ (GPU Test With Jumps) algorithm for calculating the intersection between sorted lists. The objectives of our proposition are multiple. Our first objective is to reduce the number of tests by avoiding comparing parts that will not necessarily be shared between the lists in the inputs. To achieve this objective, we modify the structure of the lists to be compared in order to avoid these unnecessary tests. Our second objective is to exploit the parallelism of the graphics card to speed up calculation times by reducing redundant comparisons and distributing the workload evenly among GPU threads. Thus, we have implemented our solution with the CUDA language which provides direct access to the graphics processor programming. This programming language has a complete instruction set, such as double precision calculation, and allows thousands of threads to be simultaneously used. This solution is compared to other solutions (Hwang & Lin, 1971; Demaine et al., 2001; Inoue et al., 2014). We present and comment these solutions in this paper.

The rest of the article is structured as follows. Section 2 provides related works; so, a state of the art is given here, it covers the set of algorithms which are published since 1971. We extend this section by presenting some research papers which applied intersection between lists to other fields. In section 3, we present three important algorithms, as we will compare our proposition with them. Section 4 gives details of our solution. Experiments are presented and commented in section 5. Section 6 concludes this paper.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 14: 4 Issues (2021)
Volume 13: 4 Issues (2020)
Volume 12: 4 Issues (2019)
Volume 11: 4 Issues (2018)
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing