A Novel Sentence Completion System for Punjabi Using Deep Neural Networks

A Novel Sentence Completion System for Punjabi Using Deep Neural Networks

Gurjot Singh Mahi, Amandeep Verma
Copyright: © 2022 |Pages: 25
DOI: 10.4018/IJSI.293271
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Sentence completion systems are actively studied by many researchers which ultimately results in the reduction of cognitive effort and enhancement in user-experience. The review of the literature reveals that most of the work in the said area is in English and limited effort spent on other languages, especially vernacular languages. This work aims to develop state-of-the-art sentence completion system for the Punjabi language, which is the 10th most spoken language in the world. The presented work is an outcome of the results of the experimentation on various neural network language model combinations. A new Sentence Search Algorithm (SSA) and patching system are developed to search, complete and rank the completed sub-string and give a syntactically rich sentence(s). The quantitative and qualitative evaluation metrics were utilized to evaluate the system. The results are quite promising, and the best performing model is capable of completing a given sub-string with more acceptability. Best performing model is utilized for developing the user-interface.
Article Preview
Top

1. Introduction

In 1996, Morris & Ogan (1996) published a paper stating the potential of a network of networks, i.e., the Internet held for communication researchers and conceptualized using the Internet as a mass medium for the audience. Mediums like Facebook and Twitter are widely used for communication in the English language and other world languages. In the Indian context, A. Singh et al. (2017) reported that there are 234 million users in India who uses their local language for communicating on Internet, and this number will reach 536 million by 2021, with a growth rate of 18% as compared to 3% for English language users. Despite these motivating statistics, Joshi et al. (2004) articulate that the text composition rate is relatively inferior in the Indian language (25 Words Per Minute (WPM)) as compared to the English text composition (35-40 WPM) using QWERTY keyboard (Isokoski, 2004). An extensive character database, different vowel symbols, and complex language syntax make text composition difficult in the Indian context (Sharma & Samanta, 2014). Many research studies have been conducted for the development of automatic sentence completion systems for various international languages like English - Grabski & Scheffer (2004), Bickel et al. (2005), Nandi & Jagadish (2007), Arabic - Al-safadi et al. (2014), European Portuguese - Garcia et al. (2014), Chinese - Z. Li & Qiu (2014) and Japanese- Maekawa & Takano (2017), but a little or no effort has been made for the sentence completion system in Indian languages, particularly the Punjabi language. Therefore, this work is the first research study conducted for the Punjabi sentence completion task. The developed sentence completion system enables the user to complete the partially entered set of words by providing the list of a possible set of succeeding sentence fragments, further helping in keystroke saving while typing and reducing the cognitive effort. The contribution made through this work can be summarized as follows:

  • 1.

    A detailed formal introduction has been given about the sentence completion task, and a thorough mathematical foundation is discussed about the several terminologies used in this manuscript.

  • 2.

    An automatic procedure of collecting and curating the Punjabi news articles in the Sports genre is discussed for developing the syntactically rich' Punjabi sentence dataset.

  • 3.

    The developed dataset has been employed to perform state-of-the-art experiments using five contemporary deep Neural Network Language Models (NNLMs).

  • 4.

    A novel Sentence Search Algorithm (SSA) and patching scheme are introduced for Punjabi sentence completion utilizing the trained NNLMs.

  • 5.

    The system has reported better linguistic quality while completing the Punjabi language sentences when tested using the metrics like Perplexity and Distinct. The human evaluators tested the actual ability of the system.

  • 6.

    An interactive GUI interface has been developed for the end-users, enabling them to take full advantage of the completion system.

Figure 1 gives the general architecture of the sentence completion system utilized in this manuscript.

Figure 1.

General description of the system process

IJSI.293271.f01

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024)
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing