A Fuzzy Logic Based Synonym Resolution Approach for Automated Information Retrieval

A Fuzzy Logic Based Synonym Resolution Approach for Automated Information Retrieval

Mamta Kathuria (YMCA University of Science and Technology, India), Chander Kumar Nagpal (YMCA University of Science and Technology, India) and Neelam Duhan (YMCA University of Science and Technology, India)
DOI: 10.4018/978-1-7998-0951-7.ch040

Abstract

Precise semantic similarity measurement between words is vital from the viewpoint of many automated applications in the areas of word sense disambiguation, machine translation, information retrieval and data clustering, etc. Rapid growth of the automated resources and their diversified novel applications has further reinforced this requirement. However, accurate measurement of semantic similarity is a daunting task due to inherent ambiguities of the natural language, spread of web documents across various domains, localities and dialects. All these issues render to the inadequacy of the manually maintained semantic similarity resources (i.e. dictionaries). This article uses context sets of the words under consideration in multiple corpora to compute semantic similarity and provides credible and verifiable semantic similarity results directly usable for automated applications in the intelligent manner using fuzzy inference mechanism. It can also be used to strengthen the existing lexical resources by augmenting the context set and properly defined extent of semantic similarity.
Chapter Preview
Top

Introduction

With the passage of time, human dependency has increased manifold on the automated resources. These resources depend upon the underlying inference mechanism and stored data for their decisions/actions. Unlike humans, these resources are devoid of their own intelligence and analytical capability. Therefore, it is necessary that these systems be embedded with the resources which are totally free from ambiguity so that their decisions/actions are reliable and credible. As and when an automated system has to deal with an inherently ambiguous environment such as Natural Language Processing(NLP), there are two options available to the designer to increase the reliability and credibility of the system:

  • 1.

    To deal directly with the ambiguity at the inference level by designing the knowledge base with the help of tools like fuzzy logic which have the ability to handle the ambiguity to an extent and provide a coarse (not that precise) automated solution;

  • 2.

    To make the underlying data less ambiguous and more suitable for automated applications.

In an automated system, which works in an inherently ambiguous environment, both the approaches may be required but an effort in the direction of second approach shall reduce the requirement of first approach making the system more precise.

One of the most frequently used automated systems in the present-day environment is web search engine, commonly referred to as search engine. It tries to provide a set of documents semantically related to the query submitted by the user in the natural language. For getting better results, the search needs to be widened by using terms semantically similar to input query. This leads to the requirement of unambiguous, efficient lexical resources providing precise semantic similarity information directly usable by the automated systems. The heterogeneity in web literature owing to its global spread over geographical boundaries, languages, dialects combined with other inherent ambiguities in the natural language makes this task quite difficult which is beyond the grasp of manually maintained dictionaries. To overcome this drawback, the current literature provides several empirical methods based upon web page count and text snippets count as discussed in literature survey presented in this paper. These methods basically fail to take into account the exact context of the word pair usage due to following reasons:

  • Occurrence of a pair of words within the same page does not ensure the similarity of their contexts;

  • Sometimes the size of the snippets is increased to ensure the larger number of instances which may result in the loss of context.

To ensure the better efficiency and credibility of these web based automated systems such as search engines, it is necessary that the underlying lexical resources be automation friendly and as less ambiguous as possible. An information retrieval system based upon the crude semantic similarity taken from online lexical sources like WordNet (2005), etc., without considering the context, leads to fetching of large number of undesired pages which otherwise may not be that useful. Therefore, there is a need to tune these lexical resources in such a manner that the derived synonyms are also provided with the following:

  • Their respective applicable context set;

  • Properly defined extent of semantic similarity.

It shall help the automated systems in choosing an appropriate synonym set.

The proposed work accomplishes the above-mentioned goals by defining a credible synonym resolution process. The work intends to pave the way for the design of next generation lexical resources and shall be helpful in the areas of natural language processing, information retrieval, web mining, word sense disambiguation and automated machine translation.

The paper has been organized as follows: The introduction section of the paper is followed by the literature survey section. After taking a look on the available literature, the objectives of the proposed work have been defined. Thereafter, the details of the proposed work have been discussed. After discussing the proposed work, results have been presented and analysed. Last section concludes the work with possible future extensions.

Complete Chapter List

Search this Book:
Reset