Towards a Possibilistic Information Retrieval System Using Semantic Query Expansion

Towards a Possibilistic Information Retrieval System Using Semantic Query Expansion

Bilel Elayeb, Ibrahim Bounhas, Oussama Ben Khiroun, Fabrice Evrard, Narjès Bellamine-BenSaoud
Copyright: © 2011 |Pages: 25
DOI: 10.4018/jiit.2011100101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This paper presents a new possibilistic information retrieval system using semantic query expansion. The work is involved in query expansion strategies based on external linguistic resources. In this case, the authors exploited the French dictionary “Le Grand Robert”. First, they model the dictionary as a graph and compute similarities between query terms by exploiting the circuits in the graph. Second, the possibility theory is used by taking advantage of a double relevance measure (possibility and necessity) between the articles of the dictionary and query terms. Third, these two approaches are combined by using two different aggregation methods. The authors also benefit from an existing approach for reweighting query terms in the possibilistic matching model to improve the expansion process. In order to assess and compare the approaches, the authors performed experiments on the standard ‘LeMonde94’ test collection.
Article Preview
Top

1. Introduction

The quasi-exponential development of the human knowledge distributed on varied interest fields led to the generation of a big mass of information increasingly difficult to manage and maintain. Within this large scale environment characterised at the same time by the great number of users and the immense mass of data, it becomes essential to conceive and develop tools allowing an effective and organized access. It is crucial to develop automated interfaces which make it possible to formulate and satisfy users' informational needs. Information Retrieval (IR) is a branch of data processing interested to the acquisition, the organization, the storage and the research of information. We need Information Retrieval Systems (IRS) which constitute computer tools aiming to capitalize information and to locate relevant documents. Given an information requirement expressed as a query, the relevance is quantified according to a matching model between the query terms and the documents. Whatever the semantics are given to the representation of the objects (document or query) or the relevance definition; these models have an identical general behaviour (Dos Santos & Madeira, 2010; Baloglu et al., 2010). The majority of them represent the documents and the queries by lists of weighted keywords. Therefore from the concept of query/answer, the relevance of the result given by an IRS depends primarily on the query. However, the user is often unable to give some keywords which describe explicitly and clearly his intentional need what can deteriorate the quality of the awaited results. Query expansion is one of the strategies implemented in IRS to improve their performance and better satisfy users. It consists in enhancing the user's query by adding new terms to better express his need. In fact, there are two main approaches to query expansion in the literature, automatic query expansion (AQE) and interactive query expansion (IQE) (Ruthven, 2003). AQE is simpler for the user, but limits its performance because it has no user involvement. Having user involvement, IQE is more complex for the user, but means it can take more problems such as ambiguous queries. Besides, results of an IRS fail by finding too few relevant documents (low recall) or by retrieving too many irrelevant documents (low precision). Historically, AQE has better recall than IQE (Vélez & Weiss, 1997). Unfortunately, if the terms used to expand a query often changed the query’s meaning, AQE frequently decreased precision (Croft & Harper, 1979). The problem is that users typically consider just the first few results (Jansen & McNeese, 2005), which makes precision crucial to search performance. In contrast, IQE has balanced precision and recall leading to an earlier uptake within search. However, like AQE, the precision of IQE approaches needs improvement. Most recently, approaches have started to improve precision by incorporating semantic knowledge (Crabtree, 2009). This can be achieved by various techniques such as corpus analysis and classification (Chevallet & Nie, 1997; Claveau & Sébillot, 2004), user Relevance Feedback (RF) and integration of external linguistic resources (e.g., dictionaries, thesauri and ontologies). We focus in our work on this last interactive query expansion approach using a dictionary as proposed in Elayeb (2009). Several query expansion experiments were conducted for example by using the WordNet lexical database on English IRS (Voorhees, 1994; Smeaton, 1997). The data used for query expansion in these approaches is poor, uncertain and unclear, while possibility theory is naturally appropriate for this kind of application. It allows expressing phenomena of ignorance, imprecision and uncertainty (Brini & Boughanem, 2004). Indeed, it defines two types of relevance. On the one hand, plausible relevance, quantified by the possibility trends to eliminate non-semantically similar terms (irrelevant ones); on the other hand, necessity relevance helps improve our belief in terms not eliminated by possibility measure (i.e., semantically close words useful for expansion). Ben Khiroun et al. (in press) proposed a possibilistic approach for semantic query expansion. Based on this work, we propose a new possibilistic IRS which takes advantage, combine and compare the possibilistic and the circuit-based approaches for semantic query expansion. Moreover in this paper, we propose and investigate the idea of a possibilistic network which models the dependencies between the query terms and the articles of a dictionary. The semantic proximity is quantified through possibility and necessity measures. Expanding a query means adding the most possibly and necessarily articles of the dictionary to its terms. This process is improved by reweighting the old and the new terms to give them relative importance. As a means of assessment, we firstly compare this approach to our circuit-based distance. We combine these two approaches by using two different aggregation methods. We also benefit from an existing approach for reweighting query terms in the possibilistic matching model to improve the expansion process. Experiments are carried out using the standard “LeMonde94” test collection within the French dictionary “Le Grand Robert”. Results are evaluated and compared in terms of improving the performance of the proposed IRS. This paper is structured as follows. In Section 2, we briefly recall the main concepts of possibility theory. Section 3 constitutes a literature review in the field. In Sections 4 and 5, we present our approaches for Semantic Query Expansion (SQE) based respectively on Hierarchical Small-Worlds Networks (HSWN) and Possibilistic Networks (PN). The query expansion combining these two approaches with an illustrative example is presented in Section 6. The existing approach for reweighting query terms in the possibilistic matching model to improve the expansion process is briefly exposed in Section 7. A set of experimentations, results analysis and comparative studies are made in Section 8. In Section 9 a comparison with similar approaches is presented and main directions for future research are proposed. Finally, Section 10 summarized and concluded the main outcomes of this paper.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing