Correlation and Performance Estimation of Clone Detection Tools

Correlation and Performance Estimation of Clone Detection Tools

Pratiksha Gautam (Jaypee University Of Information Technology, Solan, India) and Hemraj Saini (Jaypee University of Information Technology, Solan, India)
Copyright: © 2018 |Pages: 17
DOI: 10.4018/IJOSSP.2018040104

Abstract

Over the past few decades, many tools and methods have been proposed by several researchers to detect clones automatically in programs and software. Nevertheless, it is not yet clear how to evaluate these tools in terms of accuracy, scalability, and portability. However, all of these tools have some merits and limitations but the application of these tools depends on the user requirements, so it is necessary for the user that they should be aware of the tools and its distinguishing aspects. This article presents the performance of six clone detection tools in terms of accuracy, scalability, and portability. The aim of this study is to make the selection of tools easy for detection of copied code.
Article Preview

1. Introduction

In a software/program, reusing code segment through copying from one section of code and pasting them into another place with or without minor extensive edits is a general process in development of software. The pasted fragment is known as code/software clone and the act of copying is known as code cloning. The presence of duplicate code due to replication is known as a potentially serious obstacle, having a negative impact on the maintainability and comprehensibility as well as the evolution of software system then it requires corrections in all replicated fragments. Thus, its analysis and detection is an emerging issue due to high maintenance cost (Sharma, 2011) as well as improving the quality, structure and design of the software system. The reported literature reveals that 66% of source codes are cloned (Marcus & Maletic, 2001; Tairas & Jeff, 2006; Puri & Kumar, 2012) therefore, it is potentially useful to find the duplicated-code to enhance the quality of software systems. The clone assortment is advantageous for optimizing the detection and re-engineering approaches. Since, duplicated code in program text may introduce supplementary obstacles including copyright infringement, repetition expansion etc. In recent years, software clone detection has been recognized as a vital area in software analysis. The adverse effect of software clones can be addressed by developing a software clone detection technique/tool. Evaluation and assessment of these code clone detection tools is a thorny process due to the diverse nature of rudimentary detection techniques and lack of standard similarity procedures.

In this paper, we compute accuracy, portability and scalability of code clone detection tools. We begin with basic introduction of software clone and after that, we evaluate software clone detection tools and techniques in two different ways. Foremost, the introduction of software clone, classification of clone types and their techniques and subsequent comparisons as well as evaluation of clone detection tools. The technical contributions in this manuscript are as follows:

  • Performance evaluation of clone detection tools on the basis of precision and recall.

  • Compare software clone detection tools in terms of portability, scalability, and robustnesss.

  • Compare software clone detection techniques in perspective of precision, recall, portability, scalability and robustnesss.

  • Compare clone detection techniques on the basis of clone properties.

  • Proposed mutation-operators are used for generating variant test cases/ software clones.

  • Evaluate clone detection tools by using generated test cases with the existing evaluation methodology.

The remainder of the paper is structured as follows. In Introduction section, we have discussed the code clone notions. Section 2 is related to classification of code clones. Section 3 presents taxonomy of software clone detection techniques. Section 4 entails evaluation metrics of detection techniques. Section 5 explores the related work. Results and conclusion with future directions are presented in section 6 and section 7 respectively.

2. Classification Of Code Clones

Figure 1 categorizes the taxonomy of software clones. It can be characterized on the basis of three facets which are illustrated below. Clone classifications are used for expansion, re-engineering and detection methods. On the basis of clone categorization, we have reiterated the most spectacular types of code clones, which were eventuated at the time of re-engineering. In the following, code clones are assorted on the basis of three aspects such as: 1) similarities between two code segments; 2) clone instance position in program; and 3) refactoring opportunities with the replicated code fragment.

Figure 1.

Classification of code clones

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 11: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2019): 1 Released, 3 Forthcoming
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 1 Issue (2015)
Volume 5: 3 Issues (2014)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing