Implementation and Analysis of Shallow Parsing Techniques in Khasi Language

Implementation and Analysis of Shallow Parsing Techniques in Khasi Language

Eusebius Lawai Lyngdoh, Aiom Minnette Mitri, Goutam Saha, Arnab Kumar Maji
Copyright: © 2024 |Pages: 20
DOI: 10.4018/979-8-3693-0728-1.ch007
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In the realm of natural language processing (NLP), after part-of-speech (POS) tagging, the subsequent crucial step is shallow parsing. In this endeavour, the authors have undertaken the development of a shallow parser for the Khasi language. The work explores an array of techniques from both traditional machine learning (ML) and modern deep learning (DL) methodologies. They have employed a variety of ML algorithms, including decision trees, logistic regression, support vector machines, random forest, and multinomial naive bayes. Additionally, they have harnessed the power of DL with models such as the vanilla recurrent neural network, long short-term memory network, gated recurrent units, and bidirectional LSTM, all geared towards achieving the shallow parsing objective. The crux of the effort lies in the meticulous comparative analysis of these techniques. The chapter delves into a comprehensive discussion of their individual performances.
Chapter Preview
Top

Literature Review

Osborne (2000) provides a novel method for approaching shallow parsing by viewing it as a POS tagging problem. The method described in the article produces a shallow parsing procedure that is more accurate and efficient by using POS tags to denote chunk boundaries. Experimental findings reveal the potency of this strategy and highlight its potential as a different and computationally effective method for shallow parsing problems. The study advances the topic of shallow parsing and provides insightful information on the connection between POS tagging and shallow parsing.

Sha & Pereira (2003) focuses on the use of CRF model for shallow parsing tasks. In addition to their success in parsing, CRF models are discussed for their ability to capture dependencies between neighbouring word labels. In order to illustrate the promise of CRFs for precise and effective shallow parsing, the paper presents experimental findings that prove that CRF-based models have higher performance when compared to other techniques.

The use of shallow NLP techniques for extracting noun phrases is covered in the paper Subhashini & Kumar (2010). The study investigates various approaches to recognise and extract noun phrases from text, including statistical and rule-based approaches. The evaluation of these methods’ accuracy in accurately capturing noun phrases is presented in the paper, demonstrating the potential uses of shallow NLP for noun phrase extraction tasks.

Asopa et al. (2016), outlines the creation and assessment of a rule-based chunker. With a focus on nouns, adverbs, verbs, and adjectives, the chunker has been developed utilising hand generated linguistic rules for various phrases and conjuncts. For annotations, the Indian Languages Chunk Tagset is employed. 500 Hindi sentences have been entered into the chunker and then subject to an HMM tagger evaluation. Precision, recall, and F-measure values of 79.68, 69.36, and 74.16 were attained by the system. Although the rule-based technique proves to be less effective, it is recommended that the system’s effectiveness can be increased by producing more chunk rules.

Warjri et al. (2018) present the various POS in Khasi’s grammatical structure and 54 tags from the POS tag set have also been published by them. Their work provides a foundation and acts as a background for future computational processing of Khasi language in machine learning (Warjri et al., 2018).

Complete Chapter List

Search this Book:
Reset