A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval

A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval

Sanjay K. Dwivedi (Department of Computer Science, Babasaheb Bhimrao Ambedkar University, Lucknow, Uttar Pradesh, India)
Copyright: © 2012 |Pages: 11
DOI: 10.4018/ijirr.2012100101
OnDemand PDF Download:
No Current Special Offers


The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language processing (NLP) application. The performance of Hindi language web information retrieval is also affected by it. In this paper, the author formalized an approach for the disambiguation of the senses to improve the performance of Hindi web information retrieval. Our system works in such a way that ambiguity detection has been performed before disambiguation of web queries. Test samples of 100 queries have been selected. When these queries were subjected to ambiguity detection, we found that 43% of them have been detected unambiguous. After ambiguity detection, the disambiguation approach is followed which is based on HSC (Highest Sense Count). Query disambiguation approach further follows query expansion. The expanded query generates the new result set which results into high precision and high similarity score. The 57 expanded queries are tested against 1000 test document instances. The overall improvement is 45% in the average precision, 23% in interpolated average precision and a significant improvement in the average similarity score of the new generated result set. The overall accuracy of our approach has been 61.4% and it improves the performance of the system by 45%.
Article Preview

Some of the early researches in WSD and its integration with IR can be found in the works of Krovetz and Croft (1992), Sanderson (1994), Sanderson (2000), Gonzalo, Peñas, and Verdejo (1999). These contributers justified the significance of WSD in the area of IR. They just broke the myth of the earlier researchers like Zernik (1991), Voorhees (1993), Wallis (1993), Sussna (1997), who concluded their work by illustrating that there is no significance of WSD on the improvement of performance of IR systems.

The majority of work done in Hindi language is restricted to the Machine Translation. Key researchers like Bhattacharyya (Bhattacharyya, Sinha, Kumar, Pande & Kashyap, 2004) who proposed the statistical approach which was very near to Lesk (1986) approach. Another unsupervised approach was given by Neetu Mishra (Mishra, Yadav & Siddiqui, 2009) for Hindi language WSD. In another work, Klapaftis and Manandhar (Klapaftis & Manandhar, 2005) used the Total Sense Score (TSS) for the disambiguation.

Besides that, some other researchers also have used web documents for the disambiguation approach (Gaona, Gelbukh & Bandyopadhyay, 2009; Katsiouli & Kalamboukis, 2009).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2021): 2 Released, 2 Forthcoming
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing