Access Full-Text Recommend to Your Library

Buy Instant Access to This Article

Instant access upon order completion

Add to Cart

Share

Recommend to Librarian Fair Use Policy

Free Content

Sample PDF

More Information

Rights & Permissions
Access on Platform
Favorite
Cite Article Cite Article

MLA

Liu, Xin, et al. "Plagiarism Detection Algorithm for Source Code in Computer Science Education." IJDET vol.13, no.4 2015: pp.29-39. https://doi.org/10.4018/IJDET.2015100102

APA

Liu, X., Xu, C., & Ouyang, B. (2015). Plagiarism Detection Algorithm for Source Code in Computer Science Education. International Journal of Distance Education Technologies (IJDET), 13(4), 29-39. https://doi.org/10.4018/IJDET.2015100102

Chicago

Liu, Xin, Chan Xu, and Boyu Ouyang. "Plagiarism Detection Algorithm for Source Code in Computer Science Education," International Journal of Distance Education Technologies (IJDET) 13, no.4: 29-39. https://doi.org/10.4018/IJDET.2015100102

Export Reference

For Librarians

Plagiarism Detection Algorithm for Source Code in Computer Science Education

Xin Liu (College of Information Engineering, Xiangtan University, Xiangtan, China), Chan Xu (College of Information Engineering, Xiangtan University, Xiangtan, China), and Boyu Ouyang (College of Information Engineering, Xiangtan University, Xiangtan, China)

Source Title: International Journal of Distance Education Technologies (IJDET) 13(4)

DOI: 10.4018/IJDET.2015100102

Abstract

Nowadays, computer programming is getting more necessary in the course of program design in college education. However, the trick of plagiarizing plus a little modification exists among some students' home works. It's not easy for teachers to judge if there's plagiarizing in source code or not. Traditional detection algorithms cannot fit this condition. The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing. There are two basic concepts of the algorithm. One is to standardize the source code via filtration against to remove the majority noises intentionally blended by plagiarists. The other one is an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching. The authors also designed an appropriate HASH function to increase the efficiency of matching. Based on the algorithm, a system was designed and proved to be practical and sufficient, which runs well and meet the practical requirement in application.

Article Preview

Top

2. Existing Methods And Shortcomings

Back in the 1970s, researchers started research of the similarity detection technology against source code. Halstead (1975) proposed the first algorithm named property counting method. The algorithm counted the operators and operands statistics appeared in the source program, and used the results as main basis of detecting. Ottenstein (1976) implemented the first source code near-duplicates detection system for Fortran by using properties counting method. Since the attribute notation doesn’t remain the program structure information, the method cannot meet practical requirements of short program due to high false alarm rate (definition in section 4).

In the mid-1990s, Verco and Wise (1996) added vector dimension technology to the properties counting method, but the effect is still not satisfactory. Damashek (1995) proposed structural measure approach, used program control flow as metrics, such methods are usually applicated with attribute notation. Such methods work well in checking large programs, because in handling complex problems, different programmers often have different ideas, probability of identical program control flow is extremely low, so the false alarm rate is relatively low, but experiments proved that when such algorithms applying on program designing jobs, it has a relatively high false alarm rate. Because programming as common work is simple and the fundamental knowledge is quite similar, so the students’ main concepts of solving the problems are similar, thus the control flow of the program will be basically alike.

Complete Article List

Search this Journal:

Reset

Volume 24: 1 Issue (2026)

Volume 23: 1 Issue (2025)

Volume 22: 1 Issue (2024)

Volume 21: 2 Issues (2023)

Volume 20: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 19: 4 Issues (2021)

Volume 18: 4 Issues (2020)

Volume 17: 4 Issues (2019)

Volume 16: 4 Issues (2018)

Volume 15: 4 Issues (2017)

Volume 14: 4 Issues (2016)

Volume 13: 4 Issues (2015)

Volume 12: 4 Issues (2014)

Volume 11: 4 Issues (2013)

Volume 10: 4 Issues (2012)

Volume 9: 4 Issues (2011)

Volume 8: 4 Issues (2010)

Volume 7: 4 Issues (2009)

Volume 6: 4 Issues (2008)

Volume 5: 4 Issues (2007)

Volume 4: 4 Issues (2006)

Volume 3: 4 Issues (2005)

Volume 2: 4 Issues (2004)

Volume 1: 4 Issues (2003)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Plagiarism Detection Algorithm for Source Code in Computer Science Education

Abstract

2. Existing Methods And Shortcomings

Complete Article List