Article Preview
TopIntroduction
Information retrieval (IR) is the science of searching information in documents relevant to a given query, from within large stored collections. The fundamental challenge of an information retrieval system (IRS) resides in matching between an information requirement statement, precisely a user’s query, and a collection of documents by ranking each one according to its importance for the query.
During the past decades, a huge amount of research was done to build a ranking model to retrieve the best relevant documents. Generally, a ranking model is either constructed with probabilistic methods or modern machine learning methods. The algorithm is based on the frequency of words, considering that a document is a set of words, often called a word bag. With these models, if a user enters a simple query, for example, “what is information retrieval” in a given IRS, hundreds of thousands, if not million results are retrieved and ranked. However, sometimes a large amount of time is spent just to get a small piece of information in those documents which are considered relevant. The amount of effort put in by the user, either satisfies or dissatisfies the user in gaining the necessary information knowledge. It was mentioned before that real users tend to give up easily when searching for information in the retrieved documents (Verma et al., 2016). Therefore, the concept of relevance no longer remains in just ensuring relevant information is available in the document but also the amount of effort needed in finding relevant information (Yilmaz, 2014).
Two widely used methods evaluate the effectiveness of information retrieval systems. The first method is called the collection-based method and it is often referred to as the Cranfield approach (Cleverdon, 1991). This approach is based on a document collection (corpus), a set of topics that contain the query, title and description to define a user’s need, and a set of relevance judgments pointing out the relevant documents in the collection to each topic, often judged by topic experts. So, to evaluate the effectiveness of IRS, the scores for the systems are generated using the retrieved ranked list of documents by the systems and the relevance judgment. The scores are calculated using evaluation indicators such as precision, recall, mean average precision, and others (Clough & Sanderson, 2013). The second evaluation method is the user-based evaluation. This approach is based on the interaction between the user and the IRS which is defined by the user’s environment such as his/her educational background, the context, subject expertise, and his/her perspective like the search goal (Park, 1994).
Comparing both the evaluation methods, the system-based and the user-based evaluation can match each other’s results (Al-Maskari, 2008). However, previous research has shown there is a broad gap between these two approaches, given that the collection-based method makes many hypotheses about what the real user looks for to satisfy his/her information needs. Additionally, there are many other assumptions to simplify the relevance evaluation (Allan et al., 2005). So, the mismatch between the two evaluation methods is due to the dissension between what the expert judges consider as relevant documents, and what the real users need to satisfy their information demand. The user’s need is specified as document utility (Turpin & Hersh, 2001). Evaluating IR relevance by documents utility in a semantic and pragmatic view was argued by Saracevic (1979) in earlier research (Saracevic, 1975) as follows: “it is fine for IR systems to provide relevant information, but the true role is to provide information that has utility-information that helps to directly resolve given problems, that directly bears on given actions, and/or that directly fits into given concerns and interests. Thus, it was argued that relevance is not a proper measure for a true evaluation of IR systems. A true measure should be utilitarian.” Following that, Yilmaz et al. stated that relevance is about how documents found by the retrieval system are useful (2014).