Article Preview
Top1. Introduction
The field of information retrieval system (IR) is as old as the computer itself. According to authors (Mooers, 1950; Savino & Sebastiani, 1998), “Information retrieval is the name of the process or method whereby a prospective user of information is able to convert his need for information into an actual list of documents in storage containing information useful to him”. IR are useful in large number of applications such as search engines (Singh et al., 2013), media search, digital libraries, recommender system, information filtering and many other's applications so there is a constant need to improve such information systems. In this context, information retrieval is an active research field in computer science area.
The most critical problem for retrieval effectiveness is the term mismatch problem (Furnas et al., 1997; Xu, 1997): the indexers and the users do often not use the same words for the same concept or idea. One of the most feasible and successful technique to handle the problem of term mismatch is to expand the original query(Query Expansion) with other words that describes the user intention or a query that is more likely to retrieve only the relevant documents. In order to consider the above problem, there is a need of automatic query expansion techniques that can assist the user in formulating the query. The query expansion may be done in different ways: manual, interactive and automatic. The type of interactive query expansion is better than automatic query expansion because both the user and system are involved in the process. But in most of the time it is not feasible to involve the user in the process of query expansion, therefore a lot of researcher's are trying to develop efficient techniques for automatic query expansion. Researchers work with co-occurrence information for expanding user query, but it has many drawbacks.
The concept of term co-occurrence has been used since the 90’s for identifying some of the semantic relationships among terms present in text documents. According to Rijsbergen (Rijsbergen, 1997), the idea of using co-occurrence statistics is used to detect some kind of semantic relations between query and document terms and exploiting it to expand the user’s queries. In fact, this idea is based on the following hypothesis: “If an index term is good at discriminating relevant from non-relevant documents then any closely associated index term is likely to be good at this”. Following are some well known co-occurrence coefficient measuring methods:
(1)(2)(3)Where ti and tj are the terms for which co-occurrence is to be calculated and di and dj are the numbers of documents in which query terms occur respectively and dij is the number of documents in which terms ti and tj co-occurs together.
In the majority of works on pseudo-relevance feedback-based automatic query expansion, co-occurrence based approach has been used for selecting query expansion terms. These are the terms that are most frequently co-occurring with the query. Co-occurrence aspects can be captured in different ways. Two methods for extracting terms are used in this paper: one is based on Jacquard coefficient of co-occurring terms and another based on contextual frequency of co-occurring terms.
The in depth analysis of co-occurrence based query expansion shows mix chances of success or failure. Thus major drawbacks and weaknesses of co-occurrence based automatic query expansion are as follows (Peat & Willett, 1991):