Article Preview
TopIntroduction
Clustering is a widely used data mining technology. A cluster generated by a clustering algorithm is a set of data objects, which are similar to the objects in the same cluster, but different from the objects in other clusters. Many clustering methods have been proposed, such as hierarchical clustering, fuzzy clustering, K-Means clustering etc.
Program clustering refers to the clustering technology applied to the software field. It identifies and groups programs with similar attributes. It is widely used in program analysis and software maintenance, such as comprehension complex software systems (Mahmoud & Niu, 2013; Timothy & Nicolas, 2002), optimizing software architecture (Lung et al. 2006; Demme & Sethumadhavan, 2012), predicting software defects (Seliya & Khoshgoftaar, 2007), software evolution (Scanniello & Marcus, 2011), to improve the quality of software. Because software exists in various control systems, such as fuzzy systems, security-critical systems and so on, program clustering algorithm has received extensive attention.
We tried to apply program clustering in automatic repair of student programs. With the development of MOOCs and online educations, how to provide students with automatic feedback and assistance has become an urgent problem to be solved. Especially for the practical programming courses, the problem is more urgent. In the process of programming practice, students often encounter various errors that make their programs unable to run correctly. However, the current on-line programming system cannot prompt students the specific location of errors or the fixing scheme. Therefore, it is of great value to analyze student programs automatically and provide suggestions for fixing bugs.
Correct template programs written by teachers can provide useful guidance for bug fixing. However, writing templates increases the workload of teachers, and the templates provided may not be comprehensive. In the process of programming exercises or examination, a large number of student programs can be obtained. Many of these programs are correct and can be used as potential template programs. But student programs that implement the same programming task may have similar implementation forms or may be implemented in different ways. This requires analysis of these potential templates, removing redundant similar programs, and retaining multiple different implementations as templates. In addition, in the process of fixing bugs, the template program similar to that of the buggy program, can provide better reference for the revision of the buggy program, and avoid unnecessary modifications. The template programs identification problem can be solved by program clustering.
Most of the existing program clustering methods only analyze the static properties of programs but lack the consideration of execution information. This limits these methods in identifying the functional equivalent programs with code variations. Therefore, the purpose of this paper is to solve this problem and apply program clustering in automatic repairing of programs.
The contributions are as follows. A program clustering method considering both the structural metrics and execution value sequences is proposed. Not only the programs with similar structure can be identified, but also the programs with similar execution behavior can be recognized. By doing this, whether two functional equivalent programs adopt the same algorithm can be judged by the similarity of execution sequences.