A Novel Coding and Discrimination (CODIS) ‎Algorithm to Extract Features from Arabic Texts to ‎Discriminate Arabic Poems

A Novel Coding and Discrimination (CODIS) ‎Algorithm to Extract Features from Arabic Texts to ‎Discriminate Arabic Poems

Nada Ahmed J. (MOP, Baghdad, Iraq), Abdul Monem S. Rahma (Computer Science Department, University of Technology, Baghdad, Iraq) and Maha A. Hmmood Alrawi (Department of Production Engineering and metallurgy, University of Technology, Baghdad, Iraq)
DOI: 10.4018/IJAPUC.2019010101

Abstract

This article proposes a new algorithm to ‎discriminate Arabic poems by inserting Arabic poems ‎texts and coding Arabic letters, extracting letters features ‎depending on letter shapes to construct a multidimensional ‎contingency table, and analyses the frequencies of letters in ‎the inserted texts statistically. The proposed coding and ‎discrimination (CODIS) algorithm could be applied for ‎different Arabic texts in any media. A sample of five poems ‎for six poets was examined to implement a CODIS algorithm. ‎A Chi-Square statistic is used to determine the relation between ‎the features and discriminate poems.‎
Article Preview

3. Arabic Letters Classification And Groups

Arabic language is one of the oldest Semitic languages; it is the mother language of vast number of individuals around the world. The Holy Quran is in Arabic language (Khallati, 2016). It is the gateway to the most culturally rich areas of the world, as a result of the important cultural changes that the Arab region has submitted. Historians need to learn the arts of this language so that they can communicate with the Arabs, and to carry out the enormous Arabic manuscripts and valuable books (Khallati, 2016).

Arabic language can be divided into two versions: classical Arabic and Modern Arabic. Classical Arabic is the official language and it is the language of the Quran. It is also used in newspapers, books, and academic researches. As classical Arabic is most used in written context, Modern Arabic is used in the daily spoken contexts (Eidus, 2007; Mrayati, 2004).

Arabic Alphabets consist of 28 letters. There is no doubt that the Arabic alphabet (أ ب ج د هـ و ز) have already been mentioned based on the alphabetical order (أ ب ت). However, there are many differences with regard to symbols or non-Arabic characters, and the additional letters have spaces between them such as ة ى ء)). All are not assembled on their number or on their positions in text.

Letters coding is one of the most important fields of the computer programs. Many studies dealt with this field in different systems to develop efficient secure systems (Salomon, 2007). Arabic letters was classified to groups; these groups are as bellow which categorized depending on different criteria:

  • A.

    Classification of Letters Depending on Letter Location in the Word.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing