Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Proximity-Based Good Turing Discounting and Kernel Functions for Pseudo-Relevance Feedback

Ilyes Khennak, Bab Ezzouar

Source Title: Information Retrieval and Management: Concepts, Methodologies, Tools, and Applications

DOI: 10.4018/978-1-5225-5191-1.ch100

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

During the last few years, it has become abundantly clear that the technological advances in information technology have led to the dramatic proliferation of information on the web and this, in turn, has led to the appearance of new words in the Internet. Due to the difficulty of reaching the meanings of these new terms, which play an essential role in retrieving the desired information, it becomes necessary to give more importance to the sites and topics where these new words appear, or rather, to give value to the words that occur frequently with them. For this purpose, in this paper, the authors propose a new robust correlation measure that assesses the relatedness of words for pseudo-relevance feedback. It is based on the co-occurrence and closeness of terms, and aims to select the appropriate words that best capture the user information need. Extensive experiments have been conducted on the OHSUMED test collection and the results show that the proposed approach achieves a considerable performance improvement over the baseline.

Chapter Preview

Top

Introduction

Over the years, many different retrieval models, such as vector space models (Salton et al., 1975; Salton & Buckley 1988), classic probabilistic models (Robertson et al., 1995; Turtle & Croft, 1991; Fuhr, 1992), and statistical language models (Ponte & Croft, 1998; Lavrenko & Croft, 2001; Zhai & Lafferty, 2001a), have been proposed and studied in order to fix the issue of searching relevant documents in a large data source that satisfy the users’ information needs (Van Rijsbergen, 1979). Nevertheless, it remains a great challenge to develop Information Retrieval Systems (IRSs) that are robust, effective, and efficient.

The reason for the ineffectiveness of IRSs is predominantly caused by the ambiguity, incompleteness and imprecision of keywords that are used to express the genuine user’s information need. One well-known technique to bypass this shortcoming is to expand the original user query with extra terms that best characterize the actual user intent. In this regard, various approaches dealing with the proximity and the interdependence of words have been implemented and tested to assess the strength of the relationship between an extra word candidate and the user query in order to find the most important terms to be used as extra terms, or rather, as expansion features. (Carpineto & Romano, 2012)

In this sense, the main goal of this work is to propose a robust correlation measure that evaluates the relatedness of words based on the co-occurrence and closeness of terms. This principle gives importance to words that frequently occur in the same context during the search process. For example, the term ‘IJIRR’ is often found in the same sites where the words ‘Journal,’ ‘IGI Global’, and ‘Retrieval’ occur. Relying on this concept was not a coincidence but rather came as a result of the researches conducted recently about the growth of the World Wide Web. All of these researches have demonstrated an exponential growth of the Web and rapid increase in the number of new pages created. In his study, Ranganathan (2011) estimated that the volume of online data indexed by Google had increased from 5 exabytes in 2002 to 280 exabytes in 2009. According to Zhu et al. (2009), this volume is expected to double in every 18 months. Ntoulas et al. (2004) interpreted these statistics in terms of the number of new pages created and indicated that their number is increasing by 8% a week. The work of Bharat and Broder (1998) went further and estimated that the World Wide Web pages are growing at the rate of 7.5 pages every second. This revolution, that the Web is witnessing, has led to the appearance of two points:

•
The first point is the entry of new words into the Web which is estimated, according to Williams and Zobel (2005), at about one new word in every two hundred words. Studies by (Williams and Zobel, 2005; Eisenstein et al., 2012; Sun, 2010) have shown that this invasion is mainly due to: neologisms, acronyms, abbreviations, emoticons, URLs and typographical errors.
•
The second point is that the users use these new words during the search. Chen et al. (2007) indicated in their study that more than 17% of query terms are non-dictionary words, 45% of them are E-speak (lol), 18% are companies and products (Google), 16% are proper names, 15% are misspellings and foreign words (Subramaniam et al., 2009; Ahmad & Kondrak, 2005).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Proximity-Based Good Turing Discounting and Kernel Functions for Pseudo-Relevance Feedback

Abstract

Introduction

Complete Chapter List