Study on the Different Forms of Plagiarism in Textual Data and Image: Internal and External Detection

Study on the Different Forms of Plagiarism in Textual Data and Image: Internal and External Detection

Frederic Jack (University of Grenoble, France)
Copyright: © 2019 |Pages: 16
DOI: 10.4018/978-1-5225-7338-8.ch004

Abstract

We live in a world of information. It is everywhere, but it is sometimes difficult to find and know that data first. In today's digital society, it's easy to find texts to plagiarize. These texts may come from the internet, publishers, or other content providers. Plagiarism is considered a serious fault. Throughout the world, universities are making significant efforts to educate students and teachers, offering guides and tutorials to explain the types of plagiarism, to avoid plagiarism. Internet contains easy to get texts the people can use in their newsrooms simply using copy and paste. This chapter shows the various types of plagiarism and the different techniques of automatic plagiarism detection and related work that addresses the topic.
Chapter Preview
Top

Introduction

The information filter (FI) also called multicast information (DSI), is one of the RI field of spots that controls large amounts of information dynamically generated. The aim is to deliver the items of information to the user in an intelligent manner, using a predefined user profile derived from its list and interests. Unlike RA, FI system is a model for long-term preferences of the user whose requests (information needs built by the user) remain relatively static. It is applied to an incoming data stream (can be a message, an image, text ... ..etc.) That changes over time and simply indicates information that could be of interest to the user and delete the irrelevant from the inflows data. In this case the documents are delivered to the system one by one; the system then calculates their similarity to the user profile and decide the relevant ones (will be presented to the user) and those that are not. The classification of filtered documents is not expected since no algorithm can be applied to judge which document is more relevant than the other (McGregor, 2005).

The construction of an information filtering system (IFC) is more complex than building a model of ad-hoc research (MRA), as an SFI is built based on a huge database of profiles of data rather that on the basis of a simple query in a MRA. A generic IFC includes four basic components (see figure): (a) data analyzer; (B) filtering component; (C) a user model component; (D) a learning component (Si, 1997).

  • The data analyzer (a): this component as input data elements (e.g., a message, an image, video ... etc.) From an information provider. The data elements are scanned and displayed in a suitable format (eg vector terms). This representation will be the input of the filter component (b).

  • The user model (c) collects explicitly and / or implicitly information about users and their information needs, and built a model user (user profile). the model constructed will be the filter input component (b).

  • The filtering component (b): it is the heart of an IFC for comparing the user profile with the data elements represented by the component (a) and decides whether a data item is relevant to the user or not (eg spam message or messages Ham). FI process is applied to a single data item (for example an incoming e-mail message). The user receives the relevant data item is the final and ultimate determinant of its relevance. his / her assessment is returned to the learning component.

  • The learning component (d) is necessary to improve the filtering, due to difficulties in modeling user profiles and their changing information needs, filtering systems must include a process of learning to detect changes in the interests of users to update the user model and ensure the production of an effective user model.

Figure 1.

Generic model of information filtering systems (FI)

In the Web exist many IR applications related to information filtering task. In our thesis, we discussed two most popular issues that are FI spam filtering and plagiarism detection.

Complete Chapter List

Search this Book:
Reset