Using Enhanced Lexicon-Based Approaches for the Determination of Aspect Categories and Their Polarities in Arabic Reviews

Using Enhanced Lexicon-Based Approaches for the Determination of Aspect Categories and Their Polarities in Arabic Reviews

Mohammad Al Smadi (Jordan University of Science and Technology, Irbid, Jordan), Islam Obaidat (Jordan University of Science and Technology, Irbid, Jordan), Mahmoud Al-Ayyoub (Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan), Rami Mohawesh (Jordan University of Science and Technology, Irbid, Jordan) and Yaser Jararweh (Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan)
DOI: 10.4018/IJITWE.2016070102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Sentiment Analysis (SA) is the process of determining the sentiment of a text written in a natural language to be positive, negative or neutral. It is one of the most interesting subfields of natural language processing (NLP) and Web mining due to its diverse applications and the challenges associated with applying it on the massive amounts of textual data available online (especially, on social networks). Most of the current work on SA focus on the English language and work on the sentence-level or the document-level. This work focuses on the less studied version of SA, which is aspect-based SA (ABSA) for the Arabic language. Specifically, this work considers two ABSA tasks: aspect category determination and aspect category polarity determination, and makes use of the publicly available human annotated Arabic dataset (HAAD) along with its baseline experiments conducted by HAAD providers. In this work, several lexicon-based approaches are presented for the two tasks at hand and show that some of the presented approaches significantly outperforms the best-known result on the given dataset. An enhancement of 9% and 46% were achieved in the tasks aspect category determination and aspect category polarity determination respectively.
Article Preview

Introduction

The field of Arabic Natural Language Processing (NLP) is a growing field with many interesting and challenging problems. Two types of Arabic are usually considered in Arabic NLP papers: Modern Standard Arabic (MSA) and dialects (vernaculars). MSA is derived from Classical Arabic. It is the official Arabic language used in media, education, culture, literature, official documents, old books and most of the new books throughout the Arab world, which spans regions of the Middle East and North Africa (MENA) in addition to parts of East Africa (Horn of Africa). It is one of the six official languages of the United Nations. It is the native language of 420 million people (Hmeidi et al., 2015b).

Classical Arabic and MSA remained the only documented versions of Arabic till mid-1990s when the dawn of Internet services and mobile communication pushed for the documentation of different Arabic dialects (vernaculars). The widespread use of emails, SMS, blogs and later social media helped in documenting these Arabic vernaculars in addition to giving birth to a new version of Arabic called Arabizi in which the Arabic words are transliterated using the Roman alphabet (Habash, 2010).

A number of specialists like Habash (Habash, 2010) consider the Arabic vernaculars as the true native language forms, since they are used in daily informal communications between people who live in the Arab world. Although these Arabic vernaculars lack standardization, are not generally found in written form and are not officially taught, they can be found in TV shows, movies, songs, theaters, etc. Arabic vernaculars are classified by linguists into seven main regional language groups: Maghrebi, Egyptian, Mesopotamian, Arabian Peninsula, Sudanese, Levantine, and Andalusian (now extinct) (Ta’amneh et al., 2014; Faqeeh et al., 2014).

Sentiment analysis (SA) and opinion mining (OM) is a growing field of study that automatically determines people's opinions, sentiments, attitudes, and emotions from written text or speech excerpts (Liu, 2012). It is the focus of a large number of research projects and the reasons for this are: availability of a number of good machine learning methods, the availability of huge corpora and, most importantly, the realization of the intellectual challenges and commercial applications of SA (Pang & Lee, 2008). This field of study is active in many research areas such as NLP, data mining, Web mining, and text mining (Liu, 2012). Due to its vast applications, SA has spread from computer science to the management sciences, political science, economics, and social sciences (Liu, 2012).

Most works on SA focus on sentence-based or document-based SA. A very interesting version of SA known as aspect-based SA (ABSA) is less studied in the literature despite its grave importance. This might be due to the several challenges it poses. This is the case for the well-studied English language. The situation is even worse for other languages such as Arabic where dozens of paper have been published in the few years on SA with only two papers (Al-Smadi et al., 2015a, 2015b) (as far as we know) published on ABSA.

Researchers in the field of SA usually depend on lexicons as essential resources in their studies to identify the polarity of different sentiments. Lexicons used in SA comprise of a list of sentiment words (opinion words, polar words, or opinion-bearing words) and sentiment phrases that used to express positive or negative sentiments. Liu presents in his book the major challenges facing the use of such lexicons (Liu, 2012).

Many studies presented different algorithms to compile lexicons of sentiment-bearing words from the English language. On the other hand, few studies presented algorithms to construct lexicons for Arabic words such as (Abbasi, Chen, & Salem, 2008; Abdul-Mageed, Diab, & Korayem, 2011; Abdul-Mageed & Diab, 2012). A number of SA studies of Arabic sentiments are based on manually constructed lexicons such as (Abdulla et al., 2013). Manually constructed lexicons are characterized by their quality, but they are limited in size.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing