Semi-Supervised Sentiment Classification on E-Commerce Reviews Using Tripartite Graph and Clustering

Semi-Supervised Sentiment Classification on E-Commerce Reviews Using Tripartite Graph and Clustering

Xin Lu, Donghong Gu, Haolan Zhang, Zhengxin Song, Qianhua Cai, Hongya Zhao, Haiming Wu
Copyright: © 2022 |Pages: 20
DOI: 10.4018/IJDWM.307904
Article PDF Download
Open access articles are freely available for download

Abstract

Sentiment classification constitutes an important topic in the field of Natural Language Processing, whose main purpose is to extract the sentiment polarity from unstructured texts. The label propagation algorithm, as a semi-supervised learning method, has been widely used in sentiment classification due to its describing sample relation in a graph-based pattern. Whereas, current graph developing strategies fail to use the global distribution and cannot handle the issues of polysemy and synonymy properly. In this paper, a semi-supervised learning methodology, integrating the tripartite graph and the clustering, is proposed for graph construction. Experiments on E-commerce reviews demonstrate the proposed method outperform baseline methods on the whole, which enables precise sentiment classification with few labeled samples.
Article Preview
Top

Introduction

The past two decades have witnessed the flourishing of electronic commerce (e-commerce) in a variety of fields (Huang et al., 2018). The sizable volume of e-commerce is growing at a rapid, steady pace (Yu et al., 2013). E-commerce provides people with daily opportunities to purchase products and services in online marketplaces (Hajli et al., 2017). Along with these shopping activities, consumer reviews reflect users’ experiences and feelings (Zhang & Zhong, 2019). Consumer engagement always delivers specific sentiments; therefore, these reviews facilitate the purchase decision of other customers and benefits business sales. As such, a deep understanding of sentiment information serves as the foundation of opinion mining and processing, which aims to outline individuals’ true intentions through their words (Bhargava et al., 2016).

In the field of natural language processing, sentiment analysis refers to the identification of language that carries an evaluative or affective attitude (Esuli & Sebastiani, 2005). Opinions are retrieved through unstructured texts. Then, the sentiment is classified into positive, negative, and neutral categories (Fu et al., 2018).

More recently, both supervised and unsupervised machine learning models have been applied to the sentiment analysis tasks. The former results in high costs and time to generate training samples. The latter lacks accuracy and processing reliability (Gao et al., 2013).

Semi-supervised sentiment classification is proven to be a flexible alternative for analyzing efficiency (Chapelle et al., 2006). Semi-supervised learning falls between unsupervised learning and supervised learning, which includes a small amount of labeled data and a large amount of unlabeled data (Li & Ye, 2018). Compared with the reliance on labeled samples of supervised learning and the low accuracy of unsupervised learning, semi-supervised learning uses as little cost as possible to obtain the classification accuracy close to supervised learning. This is acceptable in most practical scenarios.

Among these methods, the label propagation algorithm, as a graph-based semi-supervised learning approach, holds great promise in sentiment classification (Li et al., 2016). In general, the label propagation algorithm is used due to its intuitive, interpretable processing and easy resolve (Yang & Shafiq, 2018). Notably, label propagation is carried out by the graph. Once the graph is built, every instance is mapped into a node. The edge weight between two nodes represents the similarity of the two instances (Krishnakumari & Akshaya, 2019). Thus, the problem is formulated as a form of propagation on a graph where a node’s label propagates to neighboring nodes due to their proximity (Zhu et al., 2005). The labeled data act like sources that push labels through an unlabeled label (Xiaojin & Zoubin, 2002). In this way, the development of the label propagating graph is of great significance as it identifies the relation among samples. Before the deployment of a semi-supervised learning model, the graph must be established to reflect prior knowledge of the domain.

In line with the graph-developing principle, traditional strategies like word-document bipartite graph, K-nearest neighbor (KNN) graph, and Exp-weighted are applied to convey the relation within the texts (Rossi et al., 2016). Notwithstanding, the construction of graphs in a label propagation algorithm remains limited, primarily because the colloquial expressions of words in the document usually result in polysemy and synonymy issues. In a polysemy issue, the same sentiment word may express different degrees or completely opposite sentiment tendencies in different contexts. In a synonymy issue, the same sentiment may be expressed by different sentiment words (Potts, 2016). On the other hand, traditional graph-based methods pay more attention to the local distribution of the sample instead of the global information within the dataset (Yao et al., 2019). For this reason, the traditional graph-based methods are taken as a secondary choice unless a specific word with clear information can be recognized.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing