Article Preview
TopIntroduction
Sentiment analysis is an area of analyzing and studying attitudes of people towards entities and its attributes from an unstructured written text. With rapid growth of the Internet, people share their opinions and views through social media about the product, service, theme etc. Decision makers find it difficult to analyze the sentiment on large volume of reviews from the social media using traditional tools and techniques. Many methods have been applied for sentiment analysis (Baumgarten M. et al., 2013) using various techniques like K-Nearest Neighbor, Support Vector Machine, Naïve Bayes, Neural Network and other methods. The methods that are extended from the neural network achieved better performance than conventional machine learning methods. Deep learning approaches have the ability to extract features without any prior knowledge of predictors from large raw data.
Generic Word Embedding (Qin et al., 2017) learns the embedding from the massive unlabeled corpus and converts individual words in a review into a real-valued vector for further processing by the neural network. Word2Vec relates the target word to their contexts for learning the embedding. Instead, it ignores the count of context words appears with the other words. GloVe (Pennington et al., 2014) is a recent global log-bilinear regression model that unites the merits of the two principal model families: native context window method and comprehensive matrix factorization. It considers global count statistics instead of only local information to construct a co-occurrence matrix from the corpus. In this paper, the advantage of Glove is exploited to perform sentiment analysis on movie reviews using IMDB dataset.
Consider a positive review from the IMDB dataset:
Review 1: The emotions in the film given by the actress when compared to other for the hospital scene is more realistic. This is the turning point of the film.
In this review, “film” and “actress” are the words that co-occur in most of the sentences in the IMDB dataset. GloVe word embedding uses global count statistics that help in capturing this repetition and large scale patterns of the input.
A neural network deals with the input sentence in a normal way. It doesn’t differentiate the relevant and irrelevant parts that are needed for sentiment classification. Attention mechanisms make a prediction by considering input from several steps over a period of time. Attention method which relates the different parts of a single long sequence to compute the strength of a sequence is recognized as self-attention or Intra attention (Ashish Vaswani et al., 2017). This mechanism considers only pertinent parts of the input rather than extraneous parts while doing a prediction task. In the Review-1, the word “realistic” is the adjective, while the term “actress” is the noun. The word “more” is an adverb which gives greater importance to that sentence. In the second sentence, the phrase “turning point” provides a unique importance to the sentence, which changes the review to be positive. The attention layer assigns a higher score for the adjective/noun like “realistic” and “actress”. Without the attention layer, the classification done by the Long Short Term Memory (LSTM) is based on the single word of a sentence. In this proposed methodology, by considering the phrase level features and applying the attention in the output of LSTM, the performance of the classification is significantly improved.
The contribution of this research work is summarized below:
- •
We proposed a Word Embedding-Self Attention Long Short Term Memory model for the extraction of opinionated words for the first time in the Sentiment Analysis field.
- •
An effective method is used in finding the co-occurrence of relative words using global count statistics of Glove word embedding.
- •
We have highlighted the important aspect term chosen by the attention layer from the given sentiment.
- •
We have analysed and proved the performance result of the proposed architecture using Friedman Test.