Article Preview
Top1. Introduction
Through the analysis and modeling of information such as the content identification of real-time photography scenes and the emotional preferences of users, combined with cutting-edge technologies such as image understanding and text generation in deep learning, the emotional state of users and the content of photography scenes can be accurately analyzed (Wei et al, 2022). while the existing sentiment analysis mostly starts with the texts produced by users on the Internet, which uses natural language processing and other technologies for analysis (Yu et al, 2022; Chatterjee, 2019). With the great improvement of the ability of convolution neural network to process image information, there are more researches on analyzing users' emotions through photographic works, which achieves good emotion classification results (Rao et al, 2016; Meng et al, 2021). Different from the tasks of object recognition and scene recognition, the task of visual emotion analysis involves more complicated factors, in addition to the image, due to the influence of individual factors (including growth environment, cultural background, social background, etc.), different people have diverse emotional understandings for the same image (Burkitt, 2002). Therefore, it is necessary to consider richer elements as much as possible in visual emotional analysis.
The deep learning model can input the texture, color and other information of the image, automatically extract the emotional features of the image, and make use of the dependency relationship between the features of different levels to model the learning representation of them, which has achieved good results in the classification of photographic emotions (Li, 2019; Zhu et al. 2019). However, learning photographic features from a global perspective makes it difficult for the convolutional neural network to determine which region or law in a photographic work to fully express the emotion expressed in the shooting process, and to define the influence of regional information on the overall emotion of the work. users' emotion towards photography is subjective, and people's emotion is related to many factors, such as people's environment at that time, the photographed content, etc. The above method of emotion classification only from the image level ignores the contextual information behind the image, but there is abundant emotional information hidden in the scenes. Therefore, it is difficult to accurately capture the fine-grained emotion of users(Bhunia et al, 2022; Li et al, 2020).
Traditional CNN can only analyze a single feature domain, ignoring the contextual information behind the photography. Deep learning methods such as convolution neural network, embedding feature and multi-feature fusion can be used to improve the effect of emotion recognition. Based on the above problems, the main contribution of this study is that
- (1)
this paper optimizes the basic structure of vgg19 through CNN, builds a photographic scene recognition model and the visual emotion analysis model, which establishes the mapping relationship between the scene and emotion;
- (2)
The information of user photography situation is extracted from the corresponding image metadata, and the mapping relationship between situation and emotion is established. The low-dimensional dense vector representation of situation features is obtained through embedding;
- (3)
The features of photographic works and the contextual features behind them can be fully integrated, so as to expand the feature domain of data.