A New Hybrid Document Clustering for PRF-Based Automatic Query Expansion Approach for Effective IR

A New Hybrid Document Clustering for PRF-Based Automatic Query Expansion Approach for Effective IR

Yogesh Gupta, Ashish Saini
Copyright: © 2020 |Pages: 23
DOI: 10.4018/IJeC.2020070105
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Automatic query expansion (AQE) is an effective measure to improve information retrieval performance by including additional terms in a user query. The pseudo relevance feedback (PRF) method employed for AQE so far has suffered from a major problem of query drift. Therefore, keeping it in view, a new hybrid document clustering for PRF based AQE approach is proposed in the present article. In this, Fuzzy logic and Particle Swarm Optimization (PSO) are used to construct document clusters. Further, a new and effective hybrid PSO and Fuzzy logic-based term weighting approach is followed to find more suitable additional query terms using a weighted score of four IR evidences which is considered maximized. Moreover, a combined semantic filtering method along with query terms re-weighting algorithms are also used to remove noisy or irrelevant terms semantically. The performance of the presented approaches in this article is tested and compared with other approaches on three benchmark data sets. The comparative analysis of all the tested approaches illustrates the superior performance of the proposed approach.
Article Preview
Top

A few researchers have proposed document clustering to enhance the performance of IR. Lee et al. (2000) used document clustering approach for PRF based document selection. They considered a document pseudo-relevant document only when it has high likeness with other documents and low similarity or no similarity with neighbor documents. This approach cannot be used for large diversified documents. For example, if any long document contains unclear and general terms (Fall et al., 2003) then there is a higher possibility of choosing that document for PRF against most of the queries. Moreover, if the documents are very short and contain many synonyms then the size of the cluster will be small and these documents will have high possibility to be chosen for PRF. Therefore, existing clustering approaches do not provide exact distribution of documents for PRF.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 7 Issues (2023)
Volume 18: 6 Issues (2022): 3 Released, 3 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing