Bidirectional Complementary Correlation-Based Multimodal Aspect-Level Sentiment Analysis

Bidirectional Complementary Correlation-Based Multimodal Aspect-Level Sentiment Analysis

Jing Yang, Yujie Xiong
Copyright: © 2024 |Pages: 16
DOI: 10.4018/IJSWIS.337598
Article PDF Download
Open access articles are freely available for download

Abstract

Aspect-based sentiment analysis is the key to natural language processing, and it focuses on the polarity of emotions associated with specific text aspects. Traditional models that combine text and visual data tend to ignore the deeper interconnections between patterns. To solve this problem, the authors propose a multimodal sentiment-oriented analysis (BiCCM-ABSA) model based on bidirectional complementary correlation. The model utilizes text-image synergy through a novel cross-modal attention mechanism to align text with image features. With the transformer architecture, it is not only a simple fusion, but also ensures the complex alignment of multi-modal features and gating mechanisms. Experiments were conducted on the Twitter-15 and Twitter-17 datasets, achieving 69.28 accuracy and 67.54% F1 score, respectively. The experimental results demonstrate the advantages of BiCCM-ABSA, the bidirectional approach of the model and the effective cross-modal correlation set a new benchmark in the field of multimodal emotion recognition, providing insights beyond traditional single-modal analysis.
Article Preview
Top

Introduction

In recent years, sentiment analysis has emerged as one of the most vibrant research areas in natural language processing. Its primary focus lies in analyzing people's emotional tendencies toward specific topics and events (Su et al., 2023; Yen et al., 2021). Aspect-level sentiment classification, a fundamental task in sentiment analysis, aims to discern the emotional polarity of different aspects within a text (Singh & Sachan, 2021). For example, in the sentence “Congratulations to Sean Harris, who wins the leading actor award,” two aspects are mentioned: Sean Harris and the leading actor award. Based on the context, it can be inferred that the sentiment toward Sean Harris is positive, while it remains neutral toward the leading actor award.

However, texts on social media, often containing opinions on various subjects, pose a challenge in determining the sentiment polarity of multiple aspects from a single sentence (Tobaili et al., 2019; Sahoo & Gupta, 2021). Zhou et al. (2019) noted that lots of errors in sentiment classification arise from not considering the aspect words in sentences. To address this issue, Tang et al. (2016) introduced an attention mechanism to capture the semantic relationship between aspect words and their context, which aligns with the findings of Ismail et al. (2022). Recently, there has been research on aspect-level sentiment analysis based on pretrained language models (Song et al., 2019; Mohammed et al., 2022; Zhang et al., 2023). However, these approaches tend to overlook the integration of textual data with other modal data, which is increasingly relevant in today's social media landscape (Al-Qerem et al., 2020).

As the combination of textual descriptions with corresponding images has become the predominant way for users to express their views on social media platforms, multimodal aspect-based sentiment analysis (MABSA) has emerged as a new trend (Ren et al., 2021; Al-Ayyoub et al., 2018). In literature (Ling et al., 2022), MABSA is also referred to as target-oriented multimodal sentiment analysis or entity-based multimodal sentiment analysis. This task encompasses three subtasks: multimodal aspect term extraction, multimodal aspect sentiment classification, and multimodal aspect sentiment joint extraction. Specifically, multimodal aspect term extraction is to identify aspect terms in a text that are also linked to the visual content. Multimodal aspect sentiment classification is to classify the sentiment polarity of each identified aspect considering both textual and visual contexts. Multimodal aspect sentiment joint extraction is to simultaneously perform aspect term extraction and sentiment classification in a unified framework (Barbosa et al., 2022; Salhi et al., 2021). But it struggles with the complexity of effectively integrating and interpreting textual and visual data, and faces challenges in accurately capturing the nuanced semantic relationships between these modalities. Xu et al. (2019) introduced a multi-interaction memory network for multimodal target sentiment classification. Similarly, TomBERT (J. Yu & Jiang, 2019), building upon the bidirectional encoder representations from transformers (BERT) model (Devlin et al., 2018), incorporated a target-sensitive visual attention mechanism. However, these methods have performed a simple fusion of data from different modalities, without delving deeply into the intrinsic correlations between these modalities. They do not adequately explore the deep, intrinsic correlations between textual and visual data. This superficial integration limits the depth and accuracy of sentiment analysis, failing to fully leverage the rich, nuanced interplay between text and image content that is characteristic of modern social media communication.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing