Hybrid Segmentation Prototype for Arabic Text-Based Documents: Towards Plagiarism Detection

Hybrid Segmentation Prototype for Arabic Text-Based Documents: Towards Plagiarism Detection

Sonia Alouane-Ksouri (Université de Tunis El Manar, Tunisia) and Minyar Sassi Hidri (Université de Tunis El Manar, Tunisia)
DOI: 10.4018/978-1-5225-8057-7.ch022

Abstract

The contribution of this work relates to the field of Arabic text-based document analysis for the detection of plagiarism. This analysis will be carried out according to the triadic computation model of document similarity. The authors propose a hybrid segmentation prototype for Arabic text-based documents that links different processing steps in order to generate the similarity rate between the documents of an Arabic corpus. It involves two segmentation systems and a morphological analysis in order to obtain a matrix representation adapted to the triadic similarity computation according to three abstraction levels: documents, sentences and words.
Chapter Preview
Top

Particularities Of Arabic Text

In order to clearly identify this field of application, we give a brief overview of the particularities of an Arabic text: it is read and written from right to left, it lacks vowels and punctuation, the words are characterized by agglutination and the word order in the sentence by irregularity.

Complete Chapter List

Search this Book:
Reset