Sentiment Analysis of Arabic Documents: Main Challenges and Recent Advances

Sentiment Analysis of Arabic Documents: Main Challenges and Recent Advances

Hichem Rahab (ICISI Laboratory, University of Khenchela, Algeria), Mahieddine Djoudi (TechNE Laboratory, University of Poitiers, France) and Abdelhafid Zitouni (LIRE Laboratory, University of Constantine 2, Algeria)
Copyright: © 2021 |Pages: 25
DOI: 10.4018/978-1-7998-4240-8.ch013


Today, it is usual that a consumer seeks for others' feelings about their purchasing experience on the web before a simple decision of buying a product or a service. Sentiment analysis intends to help people in taking profit from the available opinionated texts on the web for their decision making, and business is one of its challenging areas. Considerable work of sentiment analysis has been achieved in English and other Indo-European languages. Despite the important number of Arabic speakers and internet users, studies in Arabic sentiment analysis are still insufficient. The current chapter vocation is to give the main challenges of Arabic sentiment together with their recent proposed solutions in the literature. The chapter flowchart is presented in a novel manner that obtains the main challenges from presented literature works. Then it gives the proposed solutions for each challenge. The chapter reaches the finding that the future tendency will be toward rule-based techniques and deep learning, allowing for more dealings with Arabic language inherent characteristics.
Chapter Preview


The evolution of Internet use in today’s world is coupled with an important advancement in offering services for users. A tremendous amount of information and data is generated, and more needs emerge to take benefit from it (Liu, 2012). The importance of taking into accounts other opinions and advises in decision-making process (sales, voting, etc.) is a result of the neutrality of this information and its independence from any conflict of interest (Liu, 2015). When someone would sell a new product, ask for a service like a hotel booking or go to a new restaurant, or even take the decision in elections, he is no more limited to advice of family members and near friends. On Internet there are several web sites, discussion forums and social networks allowing their visitors to open debates and giving their comments on subjects, products and services of their interest (Guellil et al., 2019).

Sentiment analysis seeks to discover positive and negative sentiments about objects (ex. Cellular phones) and their attributes (image quality, weight, etc.) through natural language processing NLP, text mining and data-mining techniques (Aggarwal, 2018). Sentiment analysis aims to classify discovered people opinions into well-defined categories to facilitate hidden phenomenon understanding. Sentiment analysis can be seen as an automatic summarization of subjective documents, which allow positive or negative polarity extraction from textual documents (Pang & Lee, 2004). Opinion mining use is not limited to product reviews; it can reach user attitudes, political attitudes etc. (Aggarwal, 2018).

The application of sentiment analysis techniques covers a widespread of domains, such as business, politics, security and healthcare, to cite a few. Using sentiment analysis in healthcare domain can profit from available opinionated data in social media and web forums to help in the improvement of healthcare systems by controlling epidemics and guarantee a better care for patients (Ramírez-Tinoco et al., 2019). In the security domain, opinion mining can be used to control exchanged discussions in e-mails (Danowski, 2012), in social networks or even in phone conversations (Iskra et al., 2004). The available data at low cost in social media can be very beneficial to prevent possible perturbations in different events. This information can provide useful information for authorities and organizations, allowing them to take suitable decisions by understanding their people’s mood (Subramaniyaswamy et al., 2017).

An important number of works in data mining and sentiment analysis is achieved in European languages, especially in English. Resources in these languages are available and enough in term of quantity and quality. However, in low resourced languages such as Arabic, the number of dedicated resources in very limited. Arabic is a Semitic language with more than 400 million speakers in 22 countries, and it is the fourth most used language in the Internet by 226 million users (Internet World Stats, n.d.). Arabic letters are used to write other languages, such as Person and Urdu. The Arabic language is also important as the language of the Holy Quran the book of 1.5 billion muslins around the world.

Key Terms in this Chapter

Transcription: Relays on writing a speech in a script as it is pronounced in the goal to guide the pronunciation of beginners in a language or to convert an audio speech to a text.

Diglossia: Is a phenomenon appearing within some populations with a rich cultural heritage; in this situation people use more than one language at the same time.

Opinion Spam: Opinion that intends to influence the behaviour of Internet users by diffusing commercial, political, or social reviews in the goal to promote or discredit something. Spammers present themselves as independent reviewers without declaring their identity. The spammers may intend to promote their products or viewpoints or discredit the products or viewpoints of their competitors.

Agglutination: Is the representation of different part-of-speech POS elements in the same word.

Arabizi: Is a writing style that uses Latin script to write Arabic text without any kind of rules which leads to big differences in writing almost all Arabic words. The phenomenon emerged in the space of the Internet and especially by the spreading use of the smartphones without sophisticated Arabic keyboards.

Transliteration: Is the process of moving a text from script to another in the goal to allow foreign readers of a language to read texts in this language. Word pronunciation is not considered here.

Opinion Holder: Is the person or organization claiming an opinion in a document, by an explicit or implicit manner.

Opinion: Is someone’s viewpoint toward an entity based on their cultural, social and religious background. This point of view may be expressed in review, an article, a tv show or other media they have access to.

Diacritic: Are signs playing the role of short vowels in the Arabic language, the diacritics signs, even omitted in most of the Arabic texts today, their role is primordial to guide the pronunciation and remove the ambiguity of an important number of Arabic letters.

Sentiment: Is the positive or negative feeling of a person in response to an instantaneous event without a need to give any motivation. Thus, there is no neutral sentiment, but objectivity can be considered as a no-sentiment.

Complete Chapter List

Search this Book: