Archive - Central European Conference on Information and Intelligent Systems, CECIIS - 2011

Font Size: 
Evaluation of similarity metrics for programming code plagiarism detection method
Vedran Juricic

Last modified: 2011-10-03

Abstract


This paper shortly presents source code plagiarism detection method based on the low-level language. The similarity or distance metric that is used to calculate similarity coefficient between two source files has great impact on method's performance and results. This paper analyzes precision and recall of four most commonly used metrics, Levenstein distance, Cosine similarity, N-Gram similarity and Greedy String Tilling. Testing is based on various test cases that represent the most frequent code modification techniques.

Full Text: PDF