Natural Language Interfaces to Databases: A Survey on Recent Advances

Natural Language Interfaces to Databases: A Survey on Recent Advances

Rodolfo A. Pazos-Rangel, Gilberto Rivera, José A. Martínez F., Juana Gaspar, Rogelio Florencia-Juárez
DOI: 10.4018/978-1-7998-4730-4.ch001
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter consists of an update of a previous publication. Specifically, the chapter aims at describing the most decisive advances in NLIDBs of this decade. Unlike many surveys on NLIDBs, for this chapter, the NLIDBs will be selected according to three relevance criteria: performance (i.e., percentage of correctly answered queries), soundness of the experimental evaluation, and the number of citations. To this end, the chapter will also include a brief review of the most widely used performance measures and query corpora for testing NLIDBs.
Chapter Preview
Top

Introduction

In the last decades, the volume of information has grown exponentially. For manipulating such vast amounts of information, databases have been widely used by businesses and organizations. For accessing database information, different types of software tools have been developed. One type of such tools are database query languages; for example, SQL, which allows users to access data with ample flexibility, because of the high expressiveness of SQL. Unfortunately, SQL is a computer language that is difficult to utilize by users that are not computer professionals.

In order to facilitate casual and inexperienced users accessing database information, graphical form-based applications have been developed. These tools are very easy to use; however, they do not offer flexibility for accessing information in ways different from those for which they are developed.

Natural language interfaces to databases (NLIDBs) are software applications that allow inexperienced users to formulate queries in natural language for obtaining information stored in databases. NLIDBs have the advantages of both types of database querying tools: they are easy to use and offer high flexibility for accessing information.

Several surveys on NLIDBs have been published; some of the most important and recent are the following:

  • 1.

    Natural language interfaces to databases - An introduction by Androutsopoulos (1995).

  • 2.

    Natural language interface for database: A brief review by Nihalani (2011).

  • 3.

    A survey of natural language interface to database management system by Sujatha (2012).

  • 4.

    Natural language interfaces to databases: An analysis of the state of the art by Pazos (2013).

  • 5.

    Natural language interface to databases: A survey by Tyagi (2014).

The purpose of this chapter is to describe the most relevant advances in NLIDBs of this decade. Unlike many surveys on NLIDBs, for this chapter, the NLIDBs have been selected according to three relevance criteria: performance (i.e., percentage of correctly answered queries), soundness of the experimental evaluation, and the number of citations. To this end, the chapter will also include a brief review of the most widely used query corpora for testing NLIDBs. The focus of this chapter is on approaches that translate queries in natural language to SQL expressions; so, other database query languages are out of the scope (e.g., Porras, Florencia-Juárez, Rivera & García, 2018).

Top

Background

NLIDBs are software applications that allow users to formulate queries in natural language for obtaining information stored in databases. This is accomplished by translating a natural language expression into an SQL statement. Unfortunately, the translation from a natural language query to SQL is an extremely complex problem. This difficulty explains the slow development of NLIDB technology, which is summarized next.

Chomsky (1957) published a monograph titled Syntactic Structures, which has been considered a landmark of modern linguistics. He proposed a formal approach to natural language syntax, which consists of symbols and rules and is the origin of the constituency grammar approach. In the decades of the 60s and 70s, the first natural language querying systems were developed, and they were basically interfaces for expert systems implemented for specific domains. Some of the most famous are BASEBALL (Green, Wolf, Chomsky, & Laughery, 1961) and LUNAR (Woods, Kaplan, & Webber, 1972). Most of those NLIDBs were developed for a particular database, and consequently, they could not be easily modified for querying different databases. These systems are called domain-dependent NLIDBs, and many of them achieved good results: accuracy (percentage of correctly translated queries) of around 95%.

Complete Chapter List

Search this Book:
Reset