Article Preview
TopIntroduction
With the arrival of the data age, textual information on the Web is increasing with each passing day. Natural language processing (NLP) technology that can automate text processing, can help us to acquire and process massive amounts of textual data and analyze the business and social value contained in texts. Textual emotion analysis is one of the thriving research areas in NLP. The purpose of emotion analysis is to automatically analyze the emotional tendency of comments and texts with attitudes on various social platforms by data mining algorithms and to find out whether the speaker's attitude is agreeable or disagreeable under context. Textual emotion analysis has important research value not only in theoretical research such as learning algorithm optimization but also in our life such as e-commerce platforms and public opinion monitoring. In e-commerce, consumers get the product’s information from social media, and decide whether to buy from others’ experiences; meanwhile, merchants improve their product’s quality or strategy by understanding consumers' feedback; and enterprises can finish user experience surveys by analyzing consumers’ textual feedback on the platform with emotion analysis. In social media, people express their opinions on social events, and the interaction among them can form public opinion and guidance, by analyzing this public opinion, the government can understand people's intentions better and take appropriate measures accordingly.
As an important branch of NLP, emotion analysis mainly focuses on semantic feature extraction, feature retrieval and semantic classification. Traditional machine learning methods for emotion analysis rely on classifier performance and require manual annotation, making it difficult to achieve a qualitative breakthrough in complex contexts and multi-semantic environments. Deep learning algorithms have made great progress in emotion classification, while most current research has ignored the impact (or weight) of different semantics in a sentence. The specific meaning of words often depends on context due to ambiguity in language. When there is multiple semantics in a sentence, it is easy to cause confusion, which affects the overall evaluation of the model in turn.
For example, for the comment: “The scenery at the top of mountain was really beautiful, but the management is poor, and the garbage appeared everywhere, which affected our mood.” The semantic structure of this sentence is complex and contains different emotional factors. From a normal perspective, the front part of the sentence expresses positive emotions, while the middle and rear parts express negative emotions, this sentence expresses negative opinions overall. The key is to assign a reasonable weight for each word to represent its effect on the whole sentence. The greater the weight, the stronger tendency of a word for the real emotion behind the sentence.
Among the classical frameworks of deep learning, Convolutional neural networks (CNNs) have a pivotal position and broad application in feature extraction and abstraction. They can construct multi-channel feature vectors to enrich the dimension of the feature and can extract local features at different levels between words and sentences. Kadri et al. built the Tifinagh ancient text recognition web service framework based on CNN, outperforming SVM and extreme learning machine-based approaches, showing CNN’s ability in rare data feature extraction (Kadri et al., 2022). Anil et al. proposed a cloud-based solution for liver tumor detection based on GoogLenet, demonstrating the deployment capability of CNN(Anil et al., 2022). Mandle et al. proposed a brain tumor classification model for MRI based on VGG19, achieved a high accuracy of 99.83% and promising experimental results for different species of tumors, demonstrating the generalization ability of CNN in feature automatic extraction and data processing(Mandle et al., 2022). Borgalli and Surve have obtained good experimental results in static facial expression recognition by developing their own customized unique CNN framework, demonstrating the superior capability of CNN in feature processing (Borgalli & Surve, 2022). Hsia et al. proposed mask R-CNN with data augmentation and discrete wavelet transform based on faster R-CNN, and the model has stronger noise immunity and detection accuracy in object detection, demonstrating the superior capability of CNN in feature extraction(Hsia et al., 2022). Sushma et al. proposed a multi-modal emotion recognition model based on VGG, which extracts facial features from expressions and spectral features from speech. The tolerance of CNN for feature fusion is demonstrated(Sushma et al., 2021).