Article Preview
Top1. Introduction
In the last decades, Consumer Generated Media (CGM), such as microblogs, customer reviews, and Q&A forums on the World Wide Web (WWW), gained much popularity and have been increasingly used all over the world. The information posted on those media are valuable and often have strong influence on our daily decisions (e.g., choosing products to buy or places to visit). To make the most of the rich information, there has been much research for mining textual data on the vast WWW (Imran et al., 2015).
One of the main themes in text/web mining is sentiment analysis (Liu, 2015), which generally estimates the sentiment of an input text. In analyzing sentiment, sentiment lexicons are often utilized as one of the essential linguistic resources. A simplest form of sentiment lexicons is compiled as two lists of words or expressions; one containing “positive” ones, such as “good” and “excellent”, and the other containing “negative” ones, such as “bad” and “terrible”. In addition, sentiment lexicons may have a polarity score associated with each expression, which indicates how strong the sentiment of the expression is. As an example, Table 1 presents a fragment of sentiment lexicons, SentiWordNet (Baccianella et al., 2010), showing some positive and negative words with their polarity scores. Note that sentiments types are not necessarily limited to the dichotomous positive/negative and more fine-grained sentiment lexicons may have others types of sentiments (e.g., ashamed, scared, excited and relieved) (Takamura et al., 2005).
Table 1. A fragment of SentiWordNet (Baccianella et al., 2010)
Positive | Negative |
Term | Score | Term | Score |
Splendid | 1.000 | Scrimy | 1.000 |
Superb | 0.875 | Unfortunate | 0.889 |
Valued | 0.750 | Upset | 0.875 |
Quintessential | 0.625 | Villainous | 0.778 |