Semantic Similarity Using Register Linear Question Classification (RLQC) for Question Classification

Semantic Similarity Using Register Linear Question Classification (RLQC) for Question Classification

Shanthi Palaniappan (Sri Krishna College of Engineering and Technology, India), Sridevi U. K. (PSG College of Technology, India) and Pathur Nisha S. (Nehru Institute of Technology, India)
Copyright: © 2020 |Pages: 11
DOI: 10.4018/978-1-7998-1159-6.ch006

Abstract

Question Classification(QC) mainly deals with syntactic parsing for finding the similarity. To improve the accuracy of classification, a semantic similarity approach of a question along with the question dataset is calculated. The semantic similarity of the question is initially achieved by syntactic parsing to extract the noun, verb, adverb, and adjective. However, adjectives and adverbs do give sentences an exact meaning that should also be considered for computing the semantic similarity. The proposed RLQC (Register Linear and Question Classification) model for semantic similarity of questions uses HSO (Hirst and St. Onge) measure with Gloss based measure to enhance the semantic similarity relatedness by considering the Noun, Verb, Adverb and Adjective. The semantic similarity of the question pairs for RLQC is 0.2% higher compared to HSO model. The highest semantic similarity of the proposed model achieves a better accuracy.
Chapter Preview
Top

Introduction

Questions are usually 10-20 words long. Each question can be divided into different levels based on the taxonomy. The proposed work deals with level2 questions (Costa, 2001). QC provides the syntactic and semantic information that has the semantic similarity between concepts.

Earlier study on levels of questions including Costa taxonomy and Blooms taxonomy reveals the importance to focus on different categories of questions. To overcome these issues, a question classifier using Register Linear (RL) models for a specific domain is proposed by Shanthi (2015). The Register Linear (RL) Model classifies each input into one class for the complex questions in linear manner. The RL classification model is shown in the Figure 1. The Figure illustrates the RL model for Costa level questions.

The syntactic information of the question is more relevant in computing the semantic similarity. The meaning of a sentence does not depend only on the individual words; it also depends on the structural way the phrases are combined. The classification of questions by Shanthi (2015) takes only the noun and verb for semantic similarity using RL method. To improve the method, adjective and adverb are also considered to achieve the semantic relatedness of the concept. Semantic relatedness of the question is more suitable rather than the semantic similarity of questions to classify.

Figure 1.

Architecture diagram of RL classification model

978-1-7998-1159-6.ch006.f01

The evaluation process is carried out using the Stanford dataset used in the RL method to enhance the performance of the classification. As stated by Shanthi (2015), question classification is much essential for question answering. To improve the method further, the semantic similarity and the semantic relatedness is calculated. Register Linear Question Classification (RLQC) method is compared with the Syntax-based Measure for Semantic Similarity (SyMSS) to classify the questions efficiently. The semantic relatedness between the questions is done for a 100 pair of questions and compared with the existing approach. Corpus based methods use syntactical information that uses Latent Semantic Analysis (LSA). The disadvantage of LSA does not follow syntactic information. The antonyms and negations are not considered by LSA. For example, “Name the fruits that are red in color” and “Name the fruits that are not red in color” is not supported by LSA. The sentence “River passes though the lake” is different from “Lake passes through the river”. There are some ongoing researches focusing on the improvement of LSA.

Earlier methods (Achananuparp, 2008; Li, 2006; Islam, 2008), use pseudo-syntactic information that justifies syntactic information is most essential to find the semantic similarity. Wiemer-Hastings (2004) and Li (2006), uses the semantic similarity between concepts. The different measures of semantic similarity were also compared in this model as given by Oliva et al., (2011). The syntactic information is more relevant for a sentence level for calculating the semantic similarity. In the proposed RL model, the syntactic information is extracted at the question level that considers noun, verb, adverb and adjective for the semantic relatedness process. Pelletier (1994) stated that the meaning of each word in the sentence depends not only on the hypothesis, it also depends on the structural way the words are combined. Sridevi (2018), proposed a comparison of information extraction in deep learning framework features of annotated framework. As stated by Pelletier (1994), the meaning of the question is also important for finding the similarity, which is considered in the proposed approach.

Complete Chapter List

Search this Book:
Reset