Article Preview
TopIntroduction
Nowadays, people are very expressive on the web. Due to the exponential growth in user feedback data, it becomes necessary for every product and service provider to perform the mining of these feedbacks. People regularly share their views on current activities on Twitter or similar platforms. A fine-grained analysis of these tweets or reviews can give a clear picture of what people think about a particular topic. That is why aspect based sentiment analysis has gained popularity, and a lot of work has been done in this area in the last decade. Still, it is an active research area, especially unsupervised approaches that require improvements (Yue et al. ,2019, Do et al. , 2019).
The primary differentiation in sentiment analysis and aspect specific sentiment analysis is that the former only detect the sentiment of an overall text. Later, investigate each text sentence to find out various aspects and then determine the emotion associated with each of them. We can say, instead of evaluating the overall sentiment of a text, an aspect based approach allows us to associate specific opinions with various aspects or features of a product and service. The aspect based analysis looks more closely at the information behind a text. That is why results are more detailed and accurate.
Suppose we consider the example of COVID public sentiment analysis based on social network data. Then we require to analyze the various issues or aspects related to COVID and public sentiment about that. Here overall polarity may not be a good indicator. We required sentiment about a particular issue. The same analysis is required for every business and service related feedbacks or opinion. Due to regularly generating massive feedback data, unsupervised and semi-supervised approaches are gaining popularity.
Topic modeling is an unsupervised NLP technique representing a group of text documents with several topics that can best explain the underlying information in each document. It seems similar to clustering with a difference. Instead of numerical features, it has a collection of words. These words need to be grouped so that each group represents a topic in a document. Latent Dirichlet Allocation (LDA) is the most well-known method for modeling thematic information, i.e., topics from the document collection. It is an unsupervised learning approach that views documents as a bag of words. LDA is used in an extensive collection of documents to classify topics(Beli et al., 2003). It is helpful for Search Engine Optimization, automation of customer service, and any other instance where knowing the theme of documents is essential. It applies to the role of describing topics that best represent a collection of documents. During the topic modeling method, these topics emerge and are therefore named latent.
The main contributions to this work include the following:
- 1.
An unsupervised aspect extraction approach using optimized LDA configuration and Parts of Speech (POS) rule for unlabeled reviews.
- 2.
Categorization of aspects, using very few domain words.
- 3.
Aspect specific analysis of sentiment using SentiWordNet(SWN).
The remaining structure of the paper is as follows: Section 2 sheads light on the latest work in the field. The background and intuition of LDA and SWN described in section 3. The methodology and proposed algorithms are explained in section 4. Section 5 presented experimental details and results. The paper concluded with summarization and future directions in Section 6. In this paper, the word sentiment and opinion are used interchangeably, similarly word aspect and feature.
TopFor this study, topic modeling based approaches are mainly considered for sentiment analysis. Some hybrid models based on deep neural networks and LSTM are also discussed. We focused on very recent work of the last 3-4 years in this field.
The various survey describes the present state of arts in sentiment analysis research, mostly online reviews, and social media data(Yue et al., 2019). Detailed analysis of different Deep learning based approaches discussed along with their performance issues(Do et al., 2019). LDA was presented by Blei et al. (2003), and even after almost two decades, it is still increasing its popularity in unsupervised topic extraction.