Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Clustering of Relevant Documents Based on Findability Effort in Information Retrieval

Prabha Rajagopal, Taoufik Aghris, Fatima-Ezzahra Fettah, Sri Devi Ravana

Source Title: International Journal of Information Retrieval Research (IJIRR) 12(1)

DOI: 10.4018/IJIRR.315764

Article PDF Download Open access articles are freely available for download

Abstract

A user expresses their information need in the form of a query on an information retrieval (IR) system that retrieves a set of articles related to the query. The performance of the retrieval system is measured based on the retrieved content to the query, judged by expert topic assessors who are trained to find this relevant information. However, real users do not always succeed in finding relevant information in the retrieved list due to the amount of time and effort needed. This paper aims 1) to utilize the findability features to determine the amount of effort needed to find information from relevant documents using the machine learning approach and 2) to demonstrate changes in IR systems' performance when the effort is included in the evaluation. This study uses a natural language processing technique and unsupervised clustering approach to group documents by the amount of effort needed. The results show that relevant documents can be clustered using the k-means clustering approach, and the retrieval system performance varies by 23%, on average.

Article Preview

Top

Introduction

Information retrieval (IR) is the science of searching information in documents relevant to a given query, from within large stored collections. The fundamental challenge of an information retrieval system (IRS) resides in matching between an information requirement statement, precisely a user’s query, and a collection of documents by ranking each one according to its importance for the query.

During the past decades, a huge amount of research was done to build a ranking model to retrieve the best relevant documents. Generally, a ranking model is either constructed with probabilistic methods or modern machine learning methods. The algorithm is based on the frequency of words, considering that a document is a set of words, often called a word bag. With these models, if a user enters a simple query, for example, “what is information retrieval” in a given IRS, hundreds of thousands, if not million results are retrieved and ranked. However, sometimes a large amount of time is spent just to get a small piece of information in those documents which are considered relevant. The amount of effort put in by the user, either satisfies or dissatisfies the user in gaining the necessary information knowledge. It was mentioned before that real users tend to give up easily when searching for information in the retrieved documents (Verma et al., 2016). Therefore, the concept of relevance no longer remains in just ensuring relevant information is available in the document but also the amount of effort needed in finding relevant information (Yilmaz, 2014).

Two widely used methods evaluate the effectiveness of information retrieval systems. The first method is called the collection-based method and it is often referred to as the Cranfield approach (Cleverdon, 1991). This approach is based on a document collection (corpus), a set of topics that contain the query, title and description to define a user’s need, and a set of relevance judgments pointing out the relevant documents in the collection to each topic, often judged by topic experts. So, to evaluate the effectiveness of IRS, the scores for the systems are generated using the retrieved ranked list of documents by the systems and the relevance judgment. The scores are calculated using evaluation indicators such as precision, recall, mean average precision, and others (Clough & Sanderson, 2013). The second evaluation method is the user-based evaluation. This approach is based on the interaction between the user and the IRS which is defined by the user’s environment such as his/her educational background, the context, subject expertise, and his/her perspective like the search goal (Park, 1994).

Comparing both the evaluation methods, the system-based and the user-based evaluation can match each other’s results (Al-Maskari, 2008). However, previous research has shown there is a broad gap between these two approaches, given that the collection-based method makes many hypotheses about what the real user looks for to satisfy his/her information needs. Additionally, there are many other assumptions to simplify the relevance evaluation (Allan et al., 2005). So, the mismatch between the two evaluation methods is due to the dissension between what the expert judges consider as relevant documents, and what the real users need to satisfy their information demand. The user’s need is specified as document utility (Turpin & Hersh, 2001). Evaluating IR relevance by documents utility in a semantic and pragmatic view was argued by Saracevic (1979) in earlier research (Saracevic, 1975) as follows: “it is fine for IR systems to provide relevant information, but the true role is to provide information that has utility-information that helps to directly resolve given problems, that directly bears on given actions, and/or that directly fits into given concerns and interests. Thus, it was argued that relevance is not a proper measure for a true evaluation of IR systems. A true measure should be utilitarian.” Following that, Yilmaz et al. stated that relevance is about how documents found by the retrieval system are useful (2014).

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Clustering of Relevant Documents Based on Findability Effort in Information Retrieval

Abstract

Introduction

Complete Article List