Exploring Fuzzy Association Rules in Semantic Network Enrichment Improvement of the Semantic Indexing Process

Exploring Fuzzy Association Rules in Semantic Network Enrichment Improvement of the Semantic Indexing Process

Souheyl Mallat (Faculty of Science of Monastir, Tunisia), Emna Hkiri (LATICE Laboratory, Tunisia) and Mounir Zrigui (Faculty of Science of Monastir, Tunisia)
DOI: 10.4018/978-1-5225-5042-6.ch006

Abstract

In the aim of natural language processing applications improvement, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource EuroWordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules modelOur proposed indexing approach can be applied to text documents in various languages. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.
Chapter Preview
Top

1. Introduction

It is known that ambiguities of natural language, have a detrimental effect on the results of query terms translation in the context of information retrieval by crossing languages. However, research efforts to integrate sense disambiguation techniques in machine translation (MT) have not been successful and get unconvincing results. In addition, our automatic translation system (ATS) (Jianfeng et al., 2001) requires a high precision of disambiguation to achieve an effect on the selection of the best translation in the target language of ambiguous words.

The semantic disambiguation process of the query in the target language is based on a similar document language as the query. This document is a list of relevant sentences (most similar to a user query); these sentences noted List_S are satisfying the query, and they are classified according to their degree of linguistic relevance (semantic, morphological). Building this List_S of words is presented in the work (Mallat et al., 2013) (Mallat et al., 2014). The same lists (of French and English sentences are the result of the multilingual parallel corpus alignment. Both versions of the lists are used as resources for the disambiguation process in the queries translation (Arabic-French) and (Arabic-English). The process is to match the query and the List_S content to find the words of the query in the target language that best fits this List_S. A key feature of the method of disambiguation is that the degree of matching of each translation of an ambiguous word and List_S depends on the highest weight.

Note that this List_S is expressed by singular characteristics of specific themes such as semantic and morphological wealth that is supposed to represent the best the relevant answers to a given query. Indeed, the disambiguation process improvement requires providing an effective method for representing and better analyzing the contents of this list.

In this paper, we focus on the extraction of the concepts (descriptors) or index concepts in order to associate for each document (List_S) a representation of its contents by concepts and their associated weights.

To do this, we focus our work to propose a statistical approach to semantic indexing of multilingual documents (French or English) that are taken only on calculations of the frequency of words. Also, we focus on exploiting taxonomic and non taxonomic relations (contextual) between terms. The proposed indexing approach consists of:

  • 1.

    Extracting the significant words or index terms associated with the concepts of a document in English or French, based on the two external lexical resource (multilingual thesaurus) EuroWordNet (EWN French and English EWN) (Gonzalo et al., 1998) (Vossen et al., 1997). As we consider that a EWN is composed of a set of lexicons and a set of relations between them designated by concepts.

  • 2.

    The construction and exploitation of conceptual network formalism of a document that requires the extraction of concept nodes and relations between them extracted from the previous step. In the extraction of relations, we rely on the EWN resource to identify the taxonomic relations, and we add the fuzzy association rules model to identify non taxonomic relations (contextual) between concepts. This model represents an inference mechanism to discover these latent relations, buried in List_S and carried by the semantic context. The goal of this model is to better represent the semantic content of the document. Thus, the novelty of this model involves two aspects: (1) co-occurrence of terms is taken into account during indexing of the List_S. The model’s descriptors are no longer words but sets of index terms (term-sets). The term sets capture the intuition that semantically related terms appear near one another in a List_S. (2) To estimate the importance of the word in the document not only by its frequency of occurrence, but also by semantic proximity and contextual values with the rest of the terms in the List_S.

  • 3.

    The index concepts are generated with new weights that better represent the content of the List_S by conceptual network formalism.

The paper is organized as follows: section 2 presents the existing problems, namely the disparity of terms and ambiguity faced in the indexing process. In Section 3, we present a bacground and state of the art of the indexing methods. In Section 4 we detail our indexing approach. In Section 5, we present experiments comparison and discussion of the results. Section 6 concludes the paper.

Top

2. Problematic

We address in our work three types of problems:

Complete Chapter List

Search this Book:
Reset