140 likes | 151 Views
Evaluate performance of plagiarism detection method using intermediate language for C#, Visual Basic.Net, and C++. Identify similar code fragments and determine similarities between source files to detect plagiarism.
E N D
Performance evaluation of plagiarism detection method based on the intermediate language Vedran Juričić Tereza Jurić Marija Tkalec
Plagiarism detection method • Method for detecting plagiarism in source code for .Net languages • C# • Visual Basic.Net • C++ • … • Identify similar code fragments • Determine similarity between source files • Based on intermediate language
Plagiarism detection First Second 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. foreach (double temp in Input) { 7. if (temp>result) 8. result = temp; } 9. return result; } } } 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. for (int i=0;i<Input.Length;i++) { 7. if (Input[i]>result) 8. result = Input[i]; } 9. return result; } } } Similarity = Number of overlapping lines / Total number of lines = 6 / 9 = 66,66%
But… First Second 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. foreach (double temp in Input) { 7. if (temp>result) 8. result = temp; } 9. return result; } } } 1. using System; 2. namespace OtherTest { 3. class MyClass { 4. public double ReturnMaximum(double[] Array) { 5. double current = Input[0]; 6. for (int j=0;j<Input.Length;j++) { 7. if (Input[j]>current) 8. current = Input[j]; } 9. return result; } } } Similarity = Number of overlapping lines / Total number of lines = 0 / 9 = 0,00%
Problems • Modification of variable names, types, constants • Modification of class member definitions • Line and command reordering • … • Solution • Detail analysis • Complex preprocessing • For each supported language
Our solution • Convert from source language to low-level language (Common Intermediate Language) • By using existing tools • Compiler • Disassemler • Tools exist for all .Net languages
Our solution using System.Text; namespace Test { class Math { public double GetMaximum(double[] Input) { double result = Input[0]; foreach (double temp in Input) { if (temp>result) result = temp; } return result; } } } .method public hidebysig instance float64 GetMaximum(float64[] Input) cil managed { // Code size 61 (0x3d) .maxstack 2 .locals init (float64 V_0, float64 V_1, float64 V_2, float64[] V_3, int32 V_4, bool V_5) IL_0000: nop IL_0001: ldarg.1 IL_0002: ldc.i4.0 IL_0003: ldelem.r8 IL_0004: stloc.0 IL_0005: nop IL_0006: ldarg.1 IL_0007: stloc.3 ….. IL_0037: ldloc.0 IL_0038: stloc.2 IL_0039: br.s IL_003b IL_003b: ldloc.2 IL_003c: ret } // end of method C::GetMaximum nop ldarg.1 ldc.i4.0 ldelem.r8 stloc.0 nop ldarg.1 stloc.3 … ldloc.0 stloc.2 br.s ldloc.2 ret C# compiler C# language Common Intermediate Language
Plagiarism detection system • Evaluate the performance • Analyze and compare behavior to most commonly used plagiarism detection systems: • MOSS • JPlag • CodeMatch
Tested systems • MOSS • Developed in 1994. • Commonly used in computer science faculties • Supports 26 programming languages • JPlag • Developed in 1996. • Commonly used in education • Supports C, C++, C# and Java
Tested Systems • CodeMatch • Developed in 2003. • Commercial software • Supports 26 languages • ILMatch (our system) • Developed in 2010. • Supports all .Net languages (currently 59 languages)
Testing • 6 test categories • 50 test cases covering common code modification techniques • Evaluation methods • Precision, recall • F-measure
Results MOSS JPlag Highest F-measures CodeMatch ILMatch
Positive • No impact • User comments • Code formatting • Modification of variable and class names • Modification of class members • Changing data types • Some impact • Replacing expressions and loops • Rewritting code in different language
Further work • Significant impact • Reordering operands • Reordering class members • Adding redundant statements and variables • Improvements in comparison algorithm