Unsupervised Model for Detecting Plagiarism in Internet-based Handwritten Arabic Documents

Mahmoud Zaher, Abdulaziz Shehab, Mohamed Elhoseny, Farahat Farag Farahat

Source Title: Journal of Organizational and End User Computing (JOEUC) 32(2)

DOI: 10.4018/JOEUC.2020040103

Article PDF Download Open access articles are freely available for download

Abstract

Due to the rapid increase of internet-based data, there is urgent need for a robust intelligent documents security mechanism. Although there are many attempts to build a plagiarism detection system in natural language documents, the unlimited variation and different writing styles of each character in Arabic documents make building such systems challenging. Based on its position in a word, the same Arabic letter can be written three different ways, which makes the handwritten character recognition a cumbersome process. This article proposes an intelligent unsupervised model to detect plagiarism in these documents called ASTAP. First, a handwritten Arabic character recognition system is proposed using the Grey Wolf Optimization (GWO) algorithm. Then, a modified Abstract Syntax Tree (AST) is used to match the contents of the Arabic documents to detect any similarity. Compared to the state-of-the-art methods, ASTAP improves the effectiveness of the plagiarism detection in terms of the matched similarity ratio, the precision ratio, and the processing time.

Article Preview

Top

An Introduction

The ever-increasing smart information processing services and applications offered by the Internet have explosively widened the span of the global inter-network. The recent advancements in designing low-cost small scaled devices have harbingered a great surge in the number of Internet-enabled devices which generate a big amount of data. Accordingly, internet data management for discovering plagiarized documents plays a vital role in many applications such as file management, copyright saving, and electronic theft prevention (Lam, et al., 2016; Abdi et al., 2015). Plagiarism not only depends on the content ratio that is copied but dramatically relates to using the work of others, i.e., ideas; without proper citation (Kahloula & Berri, 2016; Abdelrahman & Khalid, 2014).

In Internet-based document processing applications (Chen & Zhao,2017), the Arabic language is considered one of the most complicated languages, especially if the document contains handwritten words. The features of Arabic alphabets have various shapes of the written form based on their position and can be extended by making a dash between the ~~two~~ letters. For Arabic in electronic or printed media, no pronouncement makes misunderstanding for some words in an inevitable situation. These challenges make the plagiarism detection in Arabic documents an arduous task. Dependently, many machine learning and artificial intelligence based methods have been developed (Hussein, 2016; Wise, 2012). For example, an online Arabic plagiarism detection tool called APD (Alzahrani & Salim, 2015) is proposed to detect the plagiarism on the Arabic web pages. However, this tool does not handle the synonyms alternations or the rewording problem. To avoid that, another system called Plaggie (Ahtiainen et al., 2011) is proposed. Besides its disability to handle the handwritten documents, Plaggie needs a long processing time to manage a computerized Arabic document.

Due to the Hugging of information, and correlation networks, the discovery of electronic thefts is a difficult task, and the discovery of the thefts started in the Arabic language and the most difficult task no doubt. And in light of the growing e-learning systems in the Arab countries, this requires special techniques to detect thefts electronic written in Arabic. And although it could use some search engines like Google, it is very difficult to copy and paste the sentences into the search engines to find these thefts. For this reason, it must develop a good tool for the discovery of electronic thefts written the Arabic language to protect e-learning systems, and to facilitate and accelerate the learning process, where it can automatically detect electronic thefts automatically by this tool.

This paper shows, ASTAP, a system that works on the Internet to enable specialists to detect thefts of electronic texts in Arabic so it can be integrated with e-learning systems to ensure the safety of students and research papers and scientific theses of electronic thefts.

The paper also describes the major components of this system, including stage outfitted, and in the end, we will establish an experimental system on a set of documents and Arabic texts and compared the results obtained with some of the existing systems, particularly TurnItIn.

Complete Article List

Search this Journal:

Reset

Volume 36: 1 Issue (2024)

Volume 35: 3 Issues (2023)

Volume 34: 10 Issues (2022)

Volume 33: 6 Issues (2021)

Volume 32: 4 Issues (2020)

Volume 31: 4 Issues (2019)

Volume 30: 4 Issues (2018)

Volume 29: 4 Issues (2017)

Volume 28: 4 Issues (2016)

Volume 27: 4 Issues (2015)

Volume 26: 4 Issues (2014)

Volume 25: 4 Issues (2013)

Volume 24: 4 Issues (2012)

Volume 23: 4 Issues (2011)

Volume 22: 4 Issues (2010)

Volume 21: 4 Issues (2009)

Volume 20: 4 Issues (2008)

Volume 19: 4 Issues (2007)

Volume 18: 4 Issues (2006)

Volume 17: 4 Issues (2005)

Volume 16: 4 Issues (2004)

Volume 15: 4 Issues (2003)

Volume 14: 4 Issues (2002)

Volume 13: 4 Issues (2001)

Volume 12: 4 Issues (2000)

Volume 11: 4 Issues (1999)

Volume 10: 4 Issues (1998)

Volume 9: 4 Issues (1997)

Volume 8: 4 Issues (1996)

Volume 7: 4 Issues (1995)

Volume 6: 4 Issues (1994)

Volume 5: 4 Issues (1993)

Volume 4: 4 Issues (1992)

Volume 3: 4 Issues (1991)

Volume 2: 4 Issues (1990)

Volume 1: 3 Issues (1989)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Unsupervised Model for Detecting Plagiarism in Internet-based Handwritten Arabic Documents

Abstract

An Introduction

Complete Article List