An Arabic Dialects Dictionary Using Word Embeddings

An Arabic Dialects Dictionary Using Word Embeddings

Azroumahli Chaimae (National School of Applied Sciences, Abdel Malek Essaâdi University, Morocco), Yacine El Younoussi (National School of Applied Sciences, Abdel Malek Essaâdi University, Morocco), Otman Moussaoui (National School of Applied Sciences, Abdel Malek Essaâdi University, Morocco) and Youssra Zahidi (National School of Applied Sciences, Abdel Malek Essaâdi University, Morocco)
Copyright: © 2019 |Pages: 14
DOI: 10.4018/IJRSDA.2019070102

Abstract

The dialectical Arabic and the Modern Standard Arabic lacks sufficient standardized language resources to enable the tasks of Arabic language processing, despite it being an active research area. This work addresses this issue by firstly highlighting the steps and the issues related to building a multi Arabic dialect corpus using web data from blogs and social media platforms (i.e. Facebook, Twitter, etc.). This is to create a vectorized dictionary for the crawled data using the word Embeddings. In other terms, the goal of this article is to build an updated multi-dialect data set, and then, to extract an annotated corpus from it.
Article Preview
Top

The Need Of A Multi-Dialect Corpus For Arabic Language

Arabic is considered one of the most used Semitic languages with almost 422 million speakers around 22 countries. In addition, it has a huge sphere of influence in the rest of the world since it is the language of the Quran, the holy book of Islam and was the language of science and technologies in the middle ages (Darwish, 2014; Boudad, Faizi, Oulad Haj Thami, & Chiheb, 2017). Further, the Arabic Language is ranked as the seventh top language, and the fastest growing language on the web as Table 1 shows, with over 140 million internet users in the Middle East and North African countries according to (Miniwatts Marketing Group, 2018) which explain the considerable interest that the Arabic language is gaining from the NLP research community. In this paragraph, the authors will give a brief description of the Arabic script, Arabic morphological complexities and the different varieties of the language.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 7: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 6: 4 Issues (2019): 3 Released, 1 Forthcoming
Volume 5: 4 Issues (2018)
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 2 Issues (2015)
Volume 1: 2 Issues (2014)
View Complete Journal Contents Listing