Article Preview
TopIntroduction
Code snippets that are copied and paste in the software code with or without change result in a code clone. Different authors provide different definitions of code clones. (Roy and Cordy, 2007; Koschke, 2007) presents a comprehensive review of the clone found in the software. Various studies provide information about the percentage of duplication in the source code. Commonly, there is a 10-20 percent cloning present in the software code (Baker, 1995; Baxter et al., 1998; Mayrand et al., 1996). And in the rare condition, it extends to 25-60 percent Ducasse et al. (1999). Due to advancements in technology, companies spend more cost on the maintenance of software.
For the maintenance of software, a code clone is detected and should be removed or refactored according to the requirements. Because if code fragments that containing the bug are copied or duplicated then each of its clones will also contain that bug. It will harden the task of developers or testers to discover the bug in the large software that contains thousands or millions of LOC. Primarily, four types of clones are found in the software as shown in Figure 1.
Type 1: Code snippet similar to other snippets with the only changes in whitespaces, comments lead to this clone. Type-1 can be detected easily with text-based and token-based techniques. The exact clone is another name for the type 1 clone.
Type 2: Type 2 clone occurs due to alteration in name of the identifier, literals, and variables, keywords. They can be detected with the token and metric-based technique. It is also mentioned as the parameterized clone.
Type 3: Results from addition, deletion of lines of codes. They can be detected with a tree-based and graph-based approach. It can further categorize into Strong and moderately strong type 3 clones. It refers to the near-miss clone.
Type 4: A Graph-based and hybrid approach can be used to detect a type 4 clone. This clone occurs when two code snippets have similar functionality but the difference in their structure. It also names the semantic clone or function clone.
Large numbers of techniques or methods are developed for the detection of clones using traditional approaches which broadly include text-based, token-based, metric-based, tree-based, and graph-based. These techniques primarily detect Type-1, Type-2, and Type-3 clones. Very few approaches detect Type-4 clones.