Article Preview
Top1. Introduction
In 1996, Morris & Ogan (1996) published a paper stating the potential of a network of networks, i.e., the Internet held for communication researchers and conceptualized using the Internet as a mass medium for the audience. Mediums like Facebook and Twitter are widely used for communication in the English language and other world languages. In the Indian context, A. Singh et al. (2017) reported that there are 234 million users in India who uses their local language for communicating on Internet, and this number will reach 536 million by 2021, with a growth rate of 18% as compared to 3% for English language users. Despite these motivating statistics, Joshi et al. (2004) articulate that the text composition rate is relatively inferior in the Indian language (25 Words Per Minute (WPM)) as compared to the English text composition (35-40 WPM) using QWERTY keyboard (Isokoski, 2004). An extensive character database, different vowel symbols, and complex language syntax make text composition difficult in the Indian context (Sharma & Samanta, 2014). Many research studies have been conducted for the development of automatic sentence completion systems for various international languages like English - Grabski & Scheffer (2004), Bickel et al. (2005), Nandi & Jagadish (2007), Arabic - Al-safadi et al. (2014), European Portuguese - Garcia et al. (2014), Chinese - Z. Li & Qiu (2014) and Japanese- Maekawa & Takano (2017), but a little or no effort has been made for the sentence completion system in Indian languages, particularly the Punjabi language. Therefore, this work is the first research study conducted for the Punjabi sentence completion task. The developed sentence completion system enables the user to complete the partially entered set of words by providing the list of a possible set of succeeding sentence fragments, further helping in keystroke saving while typing and reducing the cognitive effort. The contribution made through this work can be summarized as follows:
- 1.
A detailed formal introduction has been given about the sentence completion task, and a thorough mathematical foundation is discussed about the several terminologies used in this manuscript.
- 2.
An automatic procedure of collecting and curating the Punjabi news articles in the Sports genre is discussed for developing the syntactically rich' Punjabi sentence dataset.
- 3.
The developed dataset has been employed to perform state-of-the-art experiments using five contemporary deep Neural Network Language Models (NNLMs).
- 4.
A novel Sentence Search Algorithm (SSA) and patching scheme are introduced for Punjabi sentence completion utilizing the trained NNLMs.
- 5.
The system has reported better linguistic quality while completing the Punjabi language sentences when tested using the metrics like Perplexity and Distinct. The human evaluators tested the actual ability of the system.
- 6.
An interactive GUI interface has been developed for the end-users, enabling them to take full advantage of the completion system.
Figure 1 gives the general architecture of the sentence completion system utilized in this manuscript.
Figure 1. General description of the system process