Article Preview
TopIntroduction
The major objective of Information Retrieval (IR) systems is to find relevant documents for a user’s query (Grechanik et al., 2010; Zhai et al., 2015) Many IR systems are based on the traditional bag of words (BOW) approach. The different meanings of the query keywords are not taken into consideration, leading to an ambiguity caused by the polysemy. In most cases, words that contained into a query are polysemous. It is often possible to understand the meaning of a word from the set of words which used within; this is the notion of context. For example, the word “note” may mean “a notation representing the pitch and duration of a musical sound”, “a brief written record”, or “a piece of paper money”. Disambiguation lies into the capacity of the system to exhibit relevant synonyms of the concept i.e. to determine the precise sense that the concept has in the query context.
To solve the problem of query disambiguation, several works have been done (ALMasri et al., 2016; Fernández-Reyes et al., 2018; Serizawa and Kobayashi, 2013). In order to retrieve the truly relevant documents, majority of works on disambiguation (Hirst et al., 1998; Khan and Feng Luo, 2003; Mihalcea and Moldovan, 2000) addresses the problem by measuring the similarity between the initial query and the documents. This method is not optimal, it is necessary to proceed with disambiguation of the query independently of the document, because the ambiguity intrinsically linked to the concepts of this query degrades the search effectiveness.
Recent works (Bobed and Mena, 2016; Yan et al., 2017; Zingla et al., 2016) on the “query expansion” add similar terms, from those initially used. These terms are suggested either from resulting documents from the original query (blind expansion, relevance feedback, etc.) from a linguistic resource or from the query logs. Applying these approaches leads into a risk of introducing a noise in the search results (query drift) which yields a deviation from the user’s intention. The first approach suffers from large size of the Web resources that degrades the approach effectivity. Moreover, these approaches do not contribute getting closer to the user's desired meaning because of the disambiguation does not focus only on the original terms of the query which obstructs the process of discovering the meaning of the query. Indeed, the lack of understanding of the factors influence the query’s meaning and the results they produce because of the effect of relative positions of the words. This is due to the interrelationship of several parameters such as the dispersion of the concepts on the branches of the ontology and their depth and the semantic similarity between the concepts and their predecessors.
In this work, the case of a user requiring information on a specific topic through a query (ad-hoc search) is studied. One of the main problems regarding this search type is detection of query meaning subject to information user's need. As WordNet (George A. Miller, 1995) is one of the well-known and widely used external information resources that has been used in the proposed approach. WordNet provides a conceptual framework for the structured representation of query’s context, in which nouns, verbs, adjectives and adverbs are organized by a variety of semantic relationships. Each concept has a set of synsets (synonyms) that represent its sense.