Abstract
Opinion mining, also known as sentimental analysis, is the analysis of sentiment (emotion, affection, experience) towards the target object. In the present era, everyone is interested to know the opinions of others before making a decision or performing a task. Hence, it is necessary to collect the information (features) from relatives, friends, or web. These opinions or feedbacks help them to decide their action. With the advent of social media and use of digital technologies, web is a huge resource for data. However, it is time-consuming to read the data collected from the web and analyze it to arrive at informed decisions. This chapter provides complete overview of tools to simplify the operations of opinion mining like data collection, data cleaning, and visualization of predicted sentiment.
TopIntroduction
In recent years sentimental analysis has become a very popular field in the age of web 2.0 with the evolution of social media networks and e-commerce (Song et al., 2011).This has a very diverse effect in the daily life of a common man. Users are keen to know the feedback about products, personalities, movies, ongoing events, etc which is available in social networks. According to Bill Gates, knowledge management plays an important role in searching, organizing, analyzing and optimizing information (Wenyun & Lingyun, 2010). Sentimental analysis is a part of data mining.
Merriam-Webster Online Dictionary (n.d.) defines “Opinion is a view, judgment, or appraisal formed in the mind about a particular matter. It is a belief stronger than impression and less strong than positive knowledge”. The sentimental analysis tools help to find out people’s feelings about event or issue mentioned in the form of text in social networks or in e-commerce websites. Many people also look up to reviews by previous customers for purchasing products in e-commerce websites like Amazon, Flipkart, E-bay, Infibeam. In web, information is widely spread and it is very difficult to read all relevant sources to know the feedback about the target object. This chapter explains how to gather the data, the tools and techniques used to decide the sentiment and how the results are presented using visualization techniques. Figure 1 represents the process of sentimental analysis.
- •
Data Collection: The main component of the sentimental analysis is data collection and it is a very challenging task due to privacy concerns like fear of sharing personal data. Data can be collected from many sources. It can be done through web crawling or by sharing questionnaires on web. The popular tools to create on line survey are Google forms, Survey Monkey, Poll Everywhere and also using social media networks like Facebook, Twitter etc.
- •
Preprocessing: Data collected is generally in structured, semi structured or unstructured formats. Approximately 90% of data is unstructured in nature according to Oracle corporation. So it might be incomplete (some attributes missing, record missing), noisy (duplicate) or inconsistent. It is necessary to perform preprocessing to convert the data to useful format. Hence multiple tools are used in preprocessing like R, weka, RapidMiner, Trifacta Wrangler, python, data preparator.
- •
Feature Extraction: Identifying and extracting the feelings or sentiment in a text is called feature extraction. It can be done through MATLAB, WEKA, SciKit-Learn, R, python with NLTK, Orange and KNIME.
- •
Visualization: It is a technique of representing the results of sentimental analysis process in graphical or pictorial formats. Multiple tools used in visualization are SneseNet, Micro_WNop, WEKA, MATLAB, Cognos, Thinkmap etc.
TopBackground
All the information generated by user in online can be very helpful for individuals or organizations to make decisions. Positive or negative opinion expressed by users about candidates contesting in elections, views about the policy decisions taken by government, user reviews on a product released by a manufacturing company can help the individual or organization to strategize are some of the examples of making use of information available. This involves determining people’s attitudes based on a large amount of natural language documents (Mohandas et al., 2012).