Identifying Alternative Options for Chatbots With Multi-Criteria Decision-Making: A Comparative Study

Identifying Alternative Options for Chatbots With Multi-Criteria Decision-Making: A Comparative Study

Praveen Ranjan Srivastava, Harshit Kumar Singh, Surabhi Sakshi, Justin Zuopeng Zhang, Qiuzheng Li
Copyright: © 2024 |Pages: 25
DOI: 10.4018/JDM.345917
Article PDF Download
Open access articles are freely available for download

Abstract

Artificial intelligence-powered chatbot usage continues to grow worldwide, and there is ongoing research to identify features that maximize the utility of chatbots. This study uses the multi-criteria decision-making (MCDM) method to find the best available alternative chatbot for task completion. We identify chatbot evaluation criteria from literature followed by inputs from experts using the Delphi method. We apply CRITIC to evaluate the relative importance of the specified criteria. Finally, we list popular alternatives of chatbots and features offered and apply WASPAS and EDAS techniques to rank the available alternatives. The alternatives explored in this study include YOU, ChatGPT, PerplexityAI, ChatSonic, and CharacterAI. Both methods yield identical results in ranking, with ChatGPT emerging as the most preferred alternative based on the criteria identified.
Article Preview
Top

Literature Review

Evaluation of chatbots can be approached from various perspectives, including user satisfaction, task completion rate, conversational quality, and system performance (Cai et al., 2022). Several studies have investigated the evaluation of chatbots from these perspectives. For instance, in a survey by Mokmin and Ibrahim (2021), the authors evaluated the performance of a chatbot designed to provide education on health support to college students. The study found that the chatbot helped 73.3% of responders grasp health concerns and have pleasant conversations. Similarly, in a study, the authors evaluated the effectiveness of a chatbot designed to help tourists. The study found that user satisfaction depends on the chatbot's informativeness, empathy, and interactivity (Orden-Mejía & Huertas, 2022).

Other studies have focused on the conversational quality of chatbots. For example, Barletta et al. (2023) evaluated a chatbot's ability to converse with users informally. Similarly, in another study, the authors assessed the conversational quality of chatbots using a human evaluation metric. The study found that the chatbot could generate responses that were rated similar to those generated by humans, indicating that the chatbot could maintain natural and engaging conversations with users.

Despite these promising results, the evaluation of chatbots is not without its challenges. One of the significant challenges is the lack of standardized evaluation metrics. Currently, there is no universally accepted set of metrics for evaluating chatbots, which makes it difficult to compare the performance of different chatbots. Liang and Li (2021) provided the solution through standard criteria and definitions for chatbot evaluation. Another challenge is the lack of diverse data sets for training and testing chatbots. Most existing data sets focus on specific domains, such as customer service or restaurant booking, which limits the generalizability of chatbots to other domains (Narducci et al., 2020).

In conclusion, evaluating chatbots is a complex and challenging task requiring a multifaceted approach. While several studies have demonstrated the effectiveness of chatbots in various domains, standardized evaluation metrics and more diverse data sets still need to be standardized. Computers that performed well on the measures used to evaluate the results of natural language processing AI do not match user expectations. This points to a gap in the comprehensiveness of the existing evaluation metrics. Nonetheless, few research studies have examined how various factors affect user experience on chatbot platforms. Table 1 presents a list of such studies. This research aims to quantify the factors associated with chatbot utility that are most important to the users.

Table 1.
Existing Literature on Chatbot's Utility Criteria
ReferenceMethod of evaluationDomainObjective
Orden-Mejía & Huertas, 2022
Exploratory factor analysis; Hierarchical regression
Tourism
Investigation of factors contributing to user satisfaction with chatbots.
Barletta et al., 2023
Multicriteria decision-making (MCDM) (AHP)
Healthcare
Assess the quality of the medical chatbot and compare a chatbot's two different iterations.
Cai et al., 2022
Task-oriented user studies
Music
Investigate the effectiveness of Dialogue-based conversational recommender systems.
Mokmin and Ibrahim, 2021
Mixed method study
Education
Analysis of the efficacy, performance, and technological adoption of a chatbot created to educate users and provide health literacy.
Narducci et al., 2020
Experimental evaluation
Music
Analysis of how various interaction styles affect recommendation precision and user input costs
Sugisaki and Bleiker, 2020
Research-based method
Language interaction
Synthesis of the linguistic concepts necessary for a discussion in a natural language
Huang et al., 2018
Survey
General knowledge, reasoning, memory, and personality.
Assess the capabilities of Tarie, a conversational AI.
AbuShawar and Atwell, 2016
Quantitative and Qualitative evaluation
Language interaction
The paper discusses black box, glass box, comparative, quantitative, and qualitative natural language conversation system assessment methods.
Liang and Li, 2021ReviewCriteria and definitionsProvide standard criteria and definitions for chatbot evaluation

The assessment of platform utility in the context of chatbots has not yet been carried out, as shown in Table 1. Previous research has looked at how chatbots influence users but not at how various aspects of a platform's experience stack up against one another. Experimentation and empirical analysis using survey approaches dominate the bulk of the investigations. Figure 1 shows that there is still a need for research on how to evaluate platforms based on the utility of chatbots.

Figure 1.

Research Gaps

JDM.345917.f01

Multicriteria decision-making (MCDM) is a robust method for evaluating chatbots that fills this void by enabling decision-makers to consider multiple criteria simultaneously (Yalcin et al., 2022). This is especially crucial when judging chatbots, which are typically built to accommodate various users and accomplish numerous objectives.

Figure 2.

Framework of the Paper

JDM.345917.f02

Complete Article List

Search this Journal:
Reset
Volume 35: 1 Issue (2024)
Volume 34: 3 Issues (2023)
Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming
Volume 32: 4 Issues (2021)
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing