1 / 14

Performance evaluation of plagiarism detection method based on the intermediate language

Evaluate performance of plagiarism detection method using intermediate language for C#, Visual Basic.Net, and C++. Identify similar code fragments and determine similarities between source files to detect plagiarism.

bbrashear
Download Presentation

Performance evaluation of plagiarism detection method based on the intermediate language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance evaluation of plagiarism detection method based on the intermediate language Vedran Juričić Tereza Jurić Marija Tkalec

  2. Plagiarism detection method • Method for detecting plagiarism in source code for .Net languages • C# • Visual Basic.Net • C++ • … • Identify similar code fragments • Determine similarity between source files • Based on intermediate language

  3. Plagiarism detection First Second 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. foreach (double temp in Input) { 7. if (temp>result) 8. result = temp; } 9. return result; } } } 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. for (int i=0;i<Input.Length;i++) { 7. if (Input[i]>result) 8. result = Input[i]; } 9. return result; } } } Similarity = Number of overlapping lines / Total number of lines = 6 / 9 = 66,66%

  4. But… First Second 1. using System.Text; 2. namespace Test { 3. class Math { 4. public double GetMaximum(double[] Input) { 5. double result = Input[0]; 6. foreach (double temp in Input) { 7. if (temp>result) 8. result = temp; } 9. return result; } } } 1. using System; 2. namespace OtherTest { 3. class MyClass { 4. public double ReturnMaximum(double[] Array) { 5. double current = Input[0]; 6. for (int j=0;j<Input.Length;j++) { 7. if (Input[j]>current) 8. current = Input[j]; } 9. return result; } } } Similarity = Number of overlapping lines / Total number of lines = 0 / 9 = 0,00%

  5. Problems • Modification of variable names, types, constants • Modification of class member definitions • Line and command reordering • … • Solution • Detail analysis • Complex preprocessing • For each supported language

  6. Our solution • Convert from source language to low-level language (Common Intermediate Language) • By using existing tools • Compiler • Disassemler • Tools exist for all .Net languages

  7. Our solution using System.Text; namespace Test { class Math { public double GetMaximum(double[] Input) { double result = Input[0]; foreach (double temp in Input) { if (temp>result) result = temp; } return result; } } } .method public hidebysig instance float64 GetMaximum(float64[] Input) cil managed { // Code size 61 (0x3d) .maxstack 2 .locals init (float64 V_0, float64 V_1, float64 V_2, float64[] V_3, int32 V_4, bool V_5) IL_0000: nop IL_0001: ldarg.1 IL_0002: ldc.i4.0 IL_0003: ldelem.r8 IL_0004: stloc.0 IL_0005: nop IL_0006: ldarg.1 IL_0007: stloc.3 ….. IL_0037: ldloc.0 IL_0038: stloc.2 IL_0039: br.s IL_003b IL_003b: ldloc.2 IL_003c: ret } // end of method C::GetMaximum nop ldarg.1 ldc.i4.0 ldelem.r8 stloc.0 nop ldarg.1 stloc.3 … ldloc.0 stloc.2 br.s ldloc.2 ret C# compiler C# language Common Intermediate Language

  8. Plagiarism detection system • Evaluate the performance • Analyze and compare behavior to most commonly used plagiarism detection systems: • MOSS • JPlag • CodeMatch

  9. Tested systems • MOSS • Developed in 1994. • Commonly used in computer science faculties • Supports 26 programming languages • JPlag • Developed in 1996. • Commonly used in education • Supports C, C++, C# and Java

  10. Tested Systems • CodeMatch • Developed in 2003. • Commercial software • Supports 26 languages • ILMatch (our system) • Developed in 2010. • Supports all .Net languages (currently 59 languages)

  11. Testing • 6 test categories • 50 test cases covering common code modification techniques • Evaluation methods • Precision, recall • F-measure

  12. Results MOSS JPlag Highest F-measures CodeMatch ILMatch

  13. Positive • No impact • User comments • Code formatting • Modification of variable and class names • Modification of class members • Changing data types • Some impact • Replacing expressions and loops • Rewritting code in different language

  14. Further work • Significant impact • Reordering operands • Reordering class members • Adding redundant statements and variables • Improvements in comparison algorithm

More Related