Article Preview
TopIntroduction
The past two decades have witnessed the flourishing of electronic commerce (e-commerce) in a variety of fields (Huang et al., 2018). The sizable volume of e-commerce is growing at a rapid, steady pace (Yu et al., 2013). E-commerce provides people with daily opportunities to purchase products and services in online marketplaces (Hajli et al., 2017). Along with these shopping activities, consumer reviews reflect users’ experiences and feelings (Zhang & Zhong, 2019). Consumer engagement always delivers specific sentiments; therefore, these reviews facilitate the purchase decision of other customers and benefits business sales. As such, a deep understanding of sentiment information serves as the foundation of opinion mining and processing, which aims to outline individuals’ true intentions through their words (Bhargava et al., 2016).
In the field of natural language processing, sentiment analysis refers to the identification of language that carries an evaluative or affective attitude (Esuli & Sebastiani, 2005). Opinions are retrieved through unstructured texts. Then, the sentiment is classified into positive, negative, and neutral categories (Fu et al., 2018).
More recently, both supervised and unsupervised machine learning models have been applied to the sentiment analysis tasks. The former results in high costs and time to generate training samples. The latter lacks accuracy and processing reliability (Gao et al., 2013).
Semi-supervised sentiment classification is proven to be a flexible alternative for analyzing efficiency (Chapelle et al., 2006). Semi-supervised learning falls between unsupervised learning and supervised learning, which includes a small amount of labeled data and a large amount of unlabeled data (Li & Ye, 2018). Compared with the reliance on labeled samples of supervised learning and the low accuracy of unsupervised learning, semi-supervised learning uses as little cost as possible to obtain the classification accuracy close to supervised learning. This is acceptable in most practical scenarios.
Among these methods, the label propagation algorithm, as a graph-based semi-supervised learning approach, holds great promise in sentiment classification (Li et al., 2016). In general, the label propagation algorithm is used due to its intuitive, interpretable processing and easy resolve (Yang & Shafiq, 2018). Notably, label propagation is carried out by the graph. Once the graph is built, every instance is mapped into a node. The edge weight between two nodes represents the similarity of the two instances (Krishnakumari & Akshaya, 2019). Thus, the problem is formulated as a form of propagation on a graph where a node’s label propagates to neighboring nodes due to their proximity (Zhu et al., 2005). The labeled data act like sources that push labels through an unlabeled label (Xiaojin & Zoubin, 2002). In this way, the development of the label propagating graph is of great significance as it identifies the relation among samples. Before the deployment of a semi-supervised learning model, the graph must be established to reflect prior knowledge of the domain.
In line with the graph-developing principle, traditional strategies like word-document bipartite graph, K-nearest neighbor (KNN) graph, and Exp-weighted are applied to convey the relation within the texts (Rossi et al., 2016). Notwithstanding, the construction of graphs in a label propagation algorithm remains limited, primarily because the colloquial expressions of words in the document usually result in polysemy and synonymy issues. In a polysemy issue, the same sentiment word may express different degrees or completely opposite sentiment tendencies in different contexts. In a synonymy issue, the same sentiment may be expressed by different sentiment words (Potts, 2016). On the other hand, traditional graph-based methods pay more attention to the local distribution of the sample instead of the global information within the dataset (Yao et al., 2019). For this reason, the traditional graph-based methods are taken as a secondary choice unless a specific word with clear information can be recognized.