Product Review-Based Customer Sentiment Analysis Using an Ensemble of mRMR and Forest Optimization Algorithm (FOA)

Product Review-Based Customer Sentiment Analysis Using an Ensemble of mRMR and Forest Optimization Algorithm (FOA)

Parag Verma, Ankur Dumka, Anuj Bhardwaj, Alaknanda Ashok
Copyright: © 2022 |Pages: 21
DOI: 10.4018/IJAMC.2022010107
Article PDF Download
Open access articles are freely available for download

Abstract

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.
Article Preview
Top

1. Introduction

Over the past few years, the dataset dimensionality has been increased in various domains like text-based sentiment analysis or bioinformatics.(Zhai et al., 2014) This reality has brought an intriguing challenge to the research field as much Artificial Intelligence (AI) or Machine Learning (ML) methods unable to manage high dimensional input data that involve products. Indeed, on the occasion that we examine the dimensionality of data posted in the well-known UCI repository and libSVM database,(Chang, 2001) we can see that the largest dimensionality of the dataset has expanded to over 30 million (approximately). Therefore, a part of these calculations is additionally when they face larger instance sizes. In this new situation, it is usual to manage information collection that is much larger than both the number of highlights and the number of tests, so current learning techniques must be adjusted.

To address this issue, dimension reduction methods can be applied to reduce the number of features and to enhance the performance of the resulting learning process. One of the most frequently used dimensionality reduction processes is the feature selection (FS), which accomplishes dimensionality reduction by emptying abstracts and additional features.(Liu & Motoda, 1998) Since FS places the highlights first, it is particularly valuable for applications where model translation and information extraction are important. In any case, existing FS techniques are not expected to scale well when managing a large-scale problem (in both various highlights and cases), in such a way that their effectiveness may be fundamentally broken or they can also be insignificant.

An analysis of sentiments is a way of identifying and classifying the emotions or opinions stated in some piece of text, sentence specifically in order to determining polarity whether the writer's disposition towards a particular topic or artefact is positive, negative, or neutral. For this purpose sentiment analysis and classification uses machine learning (ML) systems and natural language processing (NLP) together. The prevalence of rapid growth on the online social media and electronic network based societies provides all possible outcomes for customers to express their perceptions and exchange their ideas about entirety, for example, social or political issues through any article, book and films and so on through web-based networked media. These are usually in the form of survey material such as Likert type scaling data or text. Nowadays organizations are very fast, they evaluate popular perceptions about their customers or their articles of Internet-based social content.(Parvathy & Bindhu, 2016) Specific online service provider organizations are hooked in the evaluation of social media data in blogs, online forums, tweets, comments, and product feedback surveys. Publically shared reviews on sites or articles are used to recognize a customer's continued perception of any product or services to maintain a good commercialization with their decision making or the nature of its services or product quality.(Stylios et al., 2014) The critical problem that arises when collecting information from a social media networking environment is that the reviews consists mostly a large amount of unwanted data, including of HTML tags, linguistic and spelling errors, and the data is usually so bulky that removing those errors is human typical and time consuming task. An efficacious approach required to solving this problem is to select the usually relevant and significant features from the dataset and dispense repetitive or immaterial features. There are some pre-processing data cleaning techniques that rely on the choice of features selection. In the data mining process for high-dimensional dataset feature selection works as a highly effective pre-preparation strategy. Taxonomy of methods of feature selection present in Figure 1.

Figure 1.

Feature selection methods taxonomy

IJAMC.2022010107.f01

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing