Cross modal sentiment analysis model based on modal representation learning

Jianguo Bai; Hai Yang; Cheng Feng; Shuxian Wang; Xue Li

doi:10.1117/12.3038252

7 August 2024 Cross modal sentiment analysis model based on modal representation learning

Jianguo Bai, Hai Yang, Cheng Feng, Shuxian Wang, Xue Li

Proceedings Volume 13229, Seventh International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2024); 132292O (2024) https://doi.org/10.1117/12.3038252
Event: Seventh International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2024), 2024, Nanchang, China

Abstract

With the rapid development of Internet and multimedia technology, people tend to express their feelings and views through video and other media. The key to sentiment analysis in user videos on social media is to fully utilize the embedded multimodal features, such as text, audio, and facial expressions, to establish efficient deep learning models. The traditional processing methods of simply fusing feature vectors or using multiple models to comprehensively predict results cannot effectively extract the intra modal characteristics and inter modal commonalities of multiple modal data, resulting in unsatisfactory accuracy of sentiment analysis results. In response to the above issues, this article takes monologue videos posted by users on social media as the specific research object and proposes a cross modal sentiment analysis model CMRL based on modal representation learning. By establishing constraints for both independent and fused modal modules, the fused modal module can fully consider the intrinsic characteristics of the modes. In order to enable the model to fully learn the intra modal characteristics, a loss function based on Pearson correlation coefficient is established by combining the sentiment analysis results of the independent modal module's speech modality, text modality, and expression image modality data with the sentiment analysis results of the fusion modal module. In order to prevent loss or confusion of intra modal features after feature fusion, the speech modal features, text modal features, and expression image features extracted by the Transformer in the independent modal module are fused, and a loss function based on Spearman correlation coefficient is established with the fused features of the fused modal module.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Jianguo Bai, Hai Yang, Cheng Feng, Shuxian Wang, and Xue Li "Cross modal sentiment analysis model based on modal representation learning", Proc. SPIE 13229, Seventh International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2024), 132292O (7 August 2024); https://doi.org/10.1117/12.3038252

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Data modeling

Feature extraction

Transformers

Feature fusion

Performance modeling

Image fusion

Correlation coefficients

Show All Keywords

Keywords/Phrases

Search In:

Publication Years