Identifying Alternative Options for Chatbots With Multi-Criteria Decision-Making: A Comparative Study

Praveen Ranjan Srivastava, Harshit Kumar Singh, Surabhi Sakshi, Justin Zuopeng Zhang, Qiuzheng Li

Source Title: Journal of Database Management (JDM) 35(1)

DOI: 10.4018/JDM.345917

Article PDF Download Open access articles are freely available for download

Abstract

Artificial intelligence-powered chatbot usage continues to grow worldwide, and there is ongoing research to identify features that maximize the utility of chatbots. This study uses the multi-criteria decision-making (MCDM) method to find the best available alternative chatbot for task completion. We identify chatbot evaluation criteria from literature followed by inputs from experts using the Delphi method. We apply CRITIC to evaluate the relative importance of the specified criteria. Finally, we list popular alternatives of chatbots and features offered and apply WASPAS and EDAS techniques to rank the available alternatives. The alternatives explored in this study include YOU, ChatGPT, PerplexityAI, ChatSonic, and CharacterAI. Both methods yield identical results in ranking, with ChatGPT emerging as the most preferred alternative based on the criteria identified.

Article Preview

Top

Literature Review

Evaluation of chatbots can be approached from various perspectives, including user satisfaction, task completion rate, conversational quality, and system performance (Cai et al., 2022). Several studies have investigated the evaluation of chatbots from these perspectives. For instance, in a survey by Mokmin and Ibrahim (2021), the authors evaluated the performance of a chatbot designed to provide education on health support to college students. The study found that the chatbot helped 73.3% of responders grasp health concerns and have pleasant conversations. Similarly, in a study, the authors evaluated the effectiveness of a chatbot designed to help tourists. The study found that user satisfaction depends on the chatbot's informativeness, empathy, and interactivity (Orden-Mejía & Huertas, 2022).

Other studies have focused on the conversational quality of chatbots. For example, Barletta et al. (2023) evaluated a chatbot's ability to converse with users informally. Similarly, in another study, the authors assessed the conversational quality of chatbots using a human evaluation metric. The study found that the chatbot could generate responses that were rated similar to those generated by humans, indicating that the chatbot could maintain natural and engaging conversations with users.

Despite these promising results, the evaluation of chatbots is not without its challenges. One of the significant challenges is the lack of standardized evaluation metrics. Currently, there is no universally accepted set of metrics for evaluating chatbots, which makes it difficult to compare the performance of different chatbots. Liang and Li (2021) provided the solution through standard criteria and definitions for chatbot evaluation. Another challenge is the lack of diverse data sets for training and testing chatbots. Most existing data sets focus on specific domains, such as customer service or restaurant booking, which limits the generalizability of chatbots to other domains (Narducci et al., 2020).

In conclusion, evaluating chatbots is a complex and challenging task requiring a multifaceted approach. While several studies have demonstrated the effectiveness of chatbots in various domains, standardized evaluation metrics and more diverse data sets still need to be standardized. Computers that performed well on the measures used to evaluate the results of natural language processing AI do not match user expectations. This points to a gap in the comprehensiveness of the existing evaluation metrics. Nonetheless, few research studies have examined how various factors affect user experience on chatbot platforms. Table 1 presents a list of such studies. This research aims to quantify the factors associated with chatbot utility that are most important to the users.

Table 1.

Existing Literature on Chatbot's Utility Criteria

Reference	Method of evaluation	Domain	Objective
Orden-Mejía & Huertas, 2022	Exploratory factor analysis; Hierarchical regression	Tourism	Investigation of factors contributing to user satisfaction with chatbots.
Barletta et al., 2023	Multicriteria decision-making (MCDM) (AHP)	Healthcare	Assess the quality of the medical chatbot and compare a chatbot's two different iterations.
Cai et al., 2022	Task-oriented user studies	Music	Investigate the effectiveness of Dialogue-based conversational recommender systems.
Mokmin and Ibrahim, 2021	Mixed method study	Education	Analysis of the efficacy, performance, and technological adoption of a chatbot created to educate users and provide health literacy.
Narducci et al., 2020	Experimental evaluation	Music	Analysis of how various interaction styles affect recommendation precision and user input costs
Sugisaki and Bleiker, 2020	Research-based method	Language interaction	Synthesis of the linguistic concepts necessary for a discussion in a natural language
Huang et al., 2018	Survey	General knowledge, reasoning, memory, and personality.	Assess the capabilities of Tarie, a conversational AI.
AbuShawar and Atwell, 2016	Quantitative and Qualitative evaluation	Language interaction	The paper discusses black box, glass box, comparative, quantitative, and qualitative natural language conversation system assessment methods.
Liang and Li, 2021	Review	Criteria and definitions	Provide standard criteria and definitions for chatbot evaluation

The assessment of platform utility in the context of chatbots has not yet been carried out, as shown in Table 1. Previous research has looked at how chatbots influence users but not at how various aspects of a platform's experience stack up against one another. Experimentation and empirical analysis using survey approaches dominate the bulk of the investigations. Figure 1 shows that there is still a need for research on how to evaluate platforms based on the utility of chatbots.

Figure 1.

Research Gaps

Multicriteria decision-making (MCDM) is a robust method for evaluating chatbots that fills this void by enabling decision-makers to consider multiple criteria simultaneously (Yalcin et al., 2022). This is especially crucial when judging chatbots, which are typically built to accommodate various users and accomplish numerous objectives.

Figure 2.

Framework of the Paper

Complete Article List

Search this Journal:

Reset

Volume 35: 1 Issue (2024)

Volume 34: 3 Issues (2023)

Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming

Volume 32: 4 Issues (2021)

Volume 31: 4 Issues (2020)

Volume 30: 4 Issues (2019)

Volume 29: 4 Issues (2018)

Volume 28: 4 Issues (2017)

Volume 27: 4 Issues (2016)

Volume 26: 4 Issues (2015)

Volume 25: 4 Issues (2014)

Volume 24: 4 Issues (2013)

Volume 23: 4 Issues (2012)

Volume 22: 4 Issues (2011)

Volume 21: 4 Issues (2010)

Volume 20: 4 Issues (2009)

Volume 19: 4 Issues (2008)

Volume 18: 4 Issues (2007)

Volume 17: 4 Issues (2006)

Volume 16: 4 Issues (2005)

Volume 15: 4 Issues (2004)

Volume 14: 4 Issues (2003)

Volume 13: 4 Issues (2002)

Volume 12: 4 Issues (2001)

Volume 11: 4 Issues (2000)

Volume 10: 4 Issues (1999)

Volume 9: 4 Issues (1998)

Volume 8: 4 Issues (1997)

Volume 7: 4 Issues (1996)

Volume 6: 4 Issues (1995)

Volume 5: 4 Issues (1994)

Volume 4: 4 Issues (1993)

Volume 3: 4 Issues (1992)

Volume 2: 4 Issues (1991)

Volume 1: 2 Issues (1990)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Identifying Alternative Options for Chatbots With Multi-Criteria Decision-Making: A Comparative Study

Abstract

Literature Review

Complete Article List

Identifying Alternative Options for Chatbots With Multi-Criteria Decision-Making: A Comparative Study

Abstract

Literature Review

Studies Related to the Evaluation of Chatbot

Complete Article List