Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Clustering of Relevant Documents Based on Findability Effort in Information Retrieval

Prabha Rajagopal (Monash University, Malaysia), Taoufik Aghris (EMINES-School of Industrial Management, Mohammed VI Polytechnic University, Morocco), Fatima-Ezzahra Fettah (EMINES-School of Industrial Management, Mohammed VI Polytechnic University, Morocco), and Sri Devi Ravana (University of Malaya, Malaysia)

Source Title: International Journal of Information Retrieval Research (IJIRR) 12(1)

DOI: 10.4018/IJIRR.315764

Article PDF Download Open access articles are freely available for download

Abstract

A user expresses their information need in the form of a query on an information retrieval (IR) system that retrieves a set of articles related to the query. The performance of the retrieval system is measured based on the retrieved content to the query, judged by expert topic assessors who are trained to find this relevant information. However, real users do not always succeed in finding relevant information in the retrieved list due to the amount of time and effort needed. This paper aims 1) to utilize the findability features to determine the amount of effort needed to find information from relevant documents using the machine learning approach and 2) to demonstrate changes in IR systems' performance when the effort is included in the evaluation. This study uses a natural language processing technique and unsupervised clustering approach to group documents by the amount of effort needed. The results show that relevant documents can be clustered using the k-means clustering approach, and the retrieval system performance varies by 23%, on average.

Article Preview

Top

Introduction

Information retrieval (IR) is the science of searching information in documents relevant to a given query, from within large stored collections. The fundamental challenge of an information retrieval system (IRS) resides in matching between an information requirement statement, precisely a user’s query, and a collection of documents by ranking each one according to its importance for the query.

During the past decades, a huge amount of research was done to build a ranking model to retrieve the best relevant documents. Generally, a ranking model is either constructed with probabilistic methods or modern machine learning methods. The algorithm is based on the frequency of words, considering that a document is a set of words, often called a word bag. With these models, if a user enters a simple query, for example, “what is information retrieval” in a given IRS, hundreds of thousands, if not million results are retrieved and ranked. However, sometimes a large amount of time is spent just to get a small piece of information in those documents which are considered relevant. The amount of effort put in by the user, either satisfies or dissatisfies the user in gaining the necessary information knowledge. It was mentioned before that real users tend to give up easily when searching for information in the retrieved documents (Verma et al., 2016). Therefore, the concept of relevance no longer remains in just ensuring relevant information is available in the document but also the amount of effort needed in finding relevant information (Yilmaz, 2014).

Two widely used methods evaluate the effectiveness of information retrieval systems. The first method is called the collection-based method and it is often referred to as the Cranfield approach (Cleverdon, 1991). This approach is based on a document collection (corpus), a set of topics that contain the query, title and description to define a user’s need, and a set of relevance judgments pointing out the relevant documents in the collection to each topic, often judged by topic experts. So, to evaluate the effectiveness of IRS, the scores for the systems are generated using the retrieved ranked list of documents by the systems and the relevance judgment. The scores are calculated using evaluation indicators such as precision, recall, mean average precision, and others (Clough & Sanderson, 2013). The second evaluation method is the user-based evaluation. This approach is based on the interaction between the user and the IRS which is defined by the user’s environment such as his/her educational background, the context, subject expertise, and his/her perspective like the search goal (Park, 1994).

Comparing both the evaluation methods, the system-based and the user-based evaluation can match each other’s results (Al-Maskari, 2008). However, previous research has shown there is a broad gap between these two approaches, given that the collection-based method makes many hypotheses about what the real user looks for to satisfy his/her information needs. Additionally, there are many other assumptions to simplify the relevance evaluation (Allan et al., 2005). So, the mismatch between the two evaluation methods is due to the dissension between what the expert judges consider as relevant documents, and what the real users need to satisfy their information demand. The user’s need is specified as document utility (Turpin & Hersh, 2001). Evaluating IR relevance by documents utility in a semantic and pragmatic view was argued by Saracevic (1979) in earlier research (Saracevic, 1975) as follows: “it is fine for IR systems to provide relevant information, but the true role is to provide information that has utility-information that helps to directly resolve given problems, that directly bears on given actions, and/or that directly fits into given concerns and interests. Thus, it was argued that relevance is not a proper measure for a true evaluation of IR systems. A true measure should be utilitarian.” Following that, Yilmaz et al. stated that relevance is about how documents found by the retrieval system are useful (2014).

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2025): Forthcoming, Available for Pre-Order

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Clustering of Relevant Documents Based on Findability Effort in Information Retrieval

Abstract

Introduction

Complete Article List