Entity-Extraction Using Hybrid Deep-Learning Approach for Hindi text

Entity-Extraction Using Hybrid Deep-Learning Approach for Hindi text

Richa Sharma, Sudha Morwal, Basant Agarwal
DOI: 10.4018/IJCINI.20210701.oa1
Article PDF Download
Open access articles are freely available for download

Abstract

This article presents a neural network-based approach to develop named entity recognition for Hindi text. In this paper, the authors propose a deep learning architecture based on convolutional neural network (CNN) and bi-directional long short-term memory (Bi-LSTM) neural network. Skip-gram approach of word2vec model is used in the proposed model to generate word vectors. In this research work, several deep learning models have been developed and evaluated as baseline systems such as recurrent neural network (RNN), long short-term memory (LSTM), Bi-LSTM. Furthermore, these baseline systems are promoted to a proposed model with the integration of CNN and conditional random field (CRF) layers. After a comparative analysis of results, it is verified that the performance of the proposed model (i.e., Bi-LSTM-CNN-CRF) is impressive. The proposed system achieves 61% precision, 56% recall, and 58% F-measure.
Article Preview
Top

Introduction

Language is a necessary entity for communication. To easiness the way of human-computer interaction, it is preferred for machines to comprehend natural languages. In order to incorporate natural language understanding feature in machines, several Natural Language Processing (NLP) techniques are used. NLP techniques depict how machines can be used to process and analyze natural languages. In previous years, several NLP applications such as text summarization, question-answering, machine translation have been developed for many natural languages. During the development of such NLP applications, a task namely Named Entity Recognition (NER) is often performed as a preprocessing step. NER is a two-step process which is used to identify proper nouns from text and classify them into predefined categories such as person, location, measure, organization, time, etc. Integration of NER in NLP applications increases the accuracy level of these applications. For example, question-answering system (Greenwood & Gaizauskas, 2003), machine-translation system (Babych & Hartley, 2003) and text clustering (Toda & Kataoka, 2005) performed well after integration of NER. Nowadays, these applications are also available for the Hindi language, therefore a state-of-art Hindi NER tool is essential to increase the performance of these NLP applications. Most of the available systems for Hindi NER use traditional approaches such as handcrafted rules with gazetteers and machine learning based models. Handcrafting rules are time consuming and require either intensive knowledge of grammar or language experts to write rules. Furthermore, several machine learning based methods have been used to implement NER including hidden markov model (Chopra, Joshi, & Mathur, 2016; Morwal, Jahan, & Chopra, 2012), support vector machine (Ekbal & Bandyopadhyay, 2010), conditional random field (Ekbal & Bandyopadhyay, 2009) and combination of these methods. Again, these methods rely on an intensive knowledge of grammar and handcrafted features. Handcrafting of features are difficult due to lack of linguistic resources in Hindi. Additionally, there are several challenges in the Hindi language such as free word order, no capitalization, lack of labeled data, ambiguity in proper nouns, etc. which need to be addressed while developing NER tool for Hindi.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 1 Issue (2022)
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing