Sequence Graph-Based Query Auto-Suggestion (SGQAS)

Sequence Graph-Based Query Auto-Suggestion (SGQAS)

DOI: 10.4018/978-1-6684-7105-0.ch018
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Query autosuggestion or auto-completion is a query prediction service that returns suggested queries for text-based queries when users type in the search box. It is a search-assistant feature of almost all search engines that helps users complete the queries without typing the entire search query. The process of query auto-suggestion typically involves analyzing the user's partial query and generating a list of suggestions based on factors such as popular search terms, the user's search history, and the search context. The suggestions are then displayed to the user in real-time, often in a drop-down menu or other interface types, allowing them to select and refine their search query easily. This chapter proposes a content-based query auto-suggestion using a graph-based word sequence representation of documents using a knowledge graph. It uses the whole sequence of all entered query terms to retrieve the names of all nodes connected to the end node of the entered path sequence of query terms to provide user suggestion queries.
Chapter Preview
Top

Introduction

In recent years, search engines have become an integral part of our daily lives. We use them to find information on just about anything, from the latest news to products to purchase. As a result, search engines have been developed to be more user-friendly and efficient, with a range of features to help users find what they are looking for quickly and easily. One such feature is autosuggestion.

Web search engines facilitate easier search and retrieval of information from the world wide web. The existence of a search engine depends on its ability to satisfy users by suggesting relevant documents according to their queries. The user-friendliness of the search engine interface also plays a major role in the selection of a search engine by users. Query Autosuggestion, also known as query auto-completion is a search-assistant facility provided by almost all search engines to help users with different query suggestions as they type in the search box (Tahery, 2020). Searching lengthy keywords is a tedious task for users. Also, it may lead to many typos or grammatical errors. The partial query entered by the user will be compared against a set of target strings stored to find suitable matches to complete the query and will be suggested to the user. Users can select a completion string from the list of suggestions with a single click without having to type the entire query. It enhances search by saving user time without having to type the full query. Figure 1 shows the query auto-suggestion facility of Google that retrieves a list of candidate substrings to complete the query.

Figure 1.

Google query auto-suggestion example

978-1-6684-7105-0.ch018.f01

Some search engines use the prefix-matching method to complete the query, while others depend on both prefix matching and postfix matching. The main advantages of Query Auto-Suggestion Systems include a reduction in the number of keystrokes to complete the user query. It also improves the quality of user queries by reducing the number of typo errors by suggesting completion queries beforehand. (Krishnan, 2021).

Search queries are of 3 types. They are:

  • 1.

    Navigational search queries

  • 2.

    Informational search queries

  • 3.

    Transactional search queries

Navigational search queries intend to find a particular website or webpage like ‘YouTube’, ‘Facebook’ etc. by navigating through a URL. Informational search queries are normal queries with the intent to find a particular information search or need that expects relevant documents as results. Transactional search queries may include ‘buy’, ‘purchase’ etc. to do a transaction like purchasing a product or shopping for items, etc.

The sequence structure or the word order of contents plays a major role in finding suggestion queries based on user search patterns. Graph-based representation of contents can easily capture the sequence structure of contents. Knowledge graphs like Neo4j become increasingly popular in this field because of the capability to easily create nodes and edges and the enhanced feature to add properties on both nodes and edges. This chapter proposes a Sequence Graph-based Query Auto-Suggestion, SGQAS, which helps to predict the completion queries based on the keyword prefix sequence of user-entered queries. This is an extension of the previous work on graph-based index representation of all documents in the documents pool using a knowledge graph that captures the sequence or word order of terms of each sentence of each document using the Word Sequence Graph (WSG) model (George, 2017).

Top

Autosuggestion is implemented using a combination of algorithms and user data. The algorithms used by search engines analyze user search behavior to generate relevant suggestions for search queries. This includes analyzing previous searches, popular queries, and other relevant data. For example, in their research paper, Cho and Roy (2016) proposed an auto-suggestion algorithm that uses click-through data to improve suggestion relevance. The algorithm was shown to significantly improve the accuracy of suggestion rankings compared to a baseline method.

Key Terms in this Chapter

Search Engine: A search engine is a software system that enables users to search and retrieve information from a database or the internet based on specific keywords or phrases.

Neo4j: A highly scalable graph database management system designed to efficiently store, manage, and query highly connected data using the graph data model.

SGQAS, Sequence Graph-Based Query Auto-Suggestion: Query auto-suggestion for Word Sequence Graph-based indexed document collection.

WSG, Word Sequence Graph: Indexing documents in a graph in sequence order of terms of each sentence.

Stanford PTB Tokenizer: A natural language processing tool that segments text into individual words, punctuation marks, and other tokens, based on the Penn Treebank standard for syntactic annotation.

Java: A high-level, object-oriented programming language that is designed to be platform-independent, portable, and secure, used for developing a wide range of applications from desktop to web and mobile.

CERMINE: An open-source library for extracting metadata and references from scholarly articles in PDF format using machine learning and computer vision techniques.

Tika's AutoDetect Parser: A component that automatically identifies and applies the appropriate parser to extract content and metadata from a wide range of file formats without the need for manual specification.

Complete Chapter List

Search this Book:
Reset