170 likes | 267 Views
Who Copied Who?. Gordon Lingard School of Software University of Technology, Sydney glingard@it.uts.edu.au. The Problem. Students copying computer code off other students within a subject is a significant problem. Different to problems of students copying from an external source.
E N D
Who Copied Who? Gordon Lingard School of Software University of Technology, Sydney glingard@it.uts.edu.au
The Problem • Students copying computer code off other students within a subject is a significant problem. • Different to problems of students copying from an external source. • Programs exists for determining code is a copy. • They don’t answer the question of who created the code and who copied it. • This presentation outlines a solution to this problem.
Presentation Outline • What is Computer Programming • Detection System • Assignment Submission System • Combining the Systems • Results and Conclusions • Questions
What Is Computer Programming?Computer Code • Computer programs are written in a formal programming language that looks like a cross between mathematics and natural language. • They have a very strict syntax structure. • The language is used to construct a large set of carefully orchestrated instructions that become the program. • Student programs are typically less than a thousand instructions. Commercial programs can be tens of thousands to millions of instructions. • Larger programs are of staggering complexity.
What Is Computer Programming? Why Learning to Program is Hard • Learning issues students face • Learning the language. • Learning how to use the language to create a program to do a specified task. • Managing the complexity as programs grow in size. • In the face of these issues, many students are overwhelmed and resort to copying.
Detection SystemProblems of detection • Disguise • Simple transformations that change the look of the code without changing what it does. • Combinatorics • n assignments creates p = n/(n-1)/2 pairs. • 100 assignments = 4950 pairs. • Code Overlap • Two pieces of code designed to do the same thing – about 50% of the code will be common. • Boilerplate code creating many false positives.
Program Instructions TokenisedInstructions Complexity Numbers if (x > y) { a[x] = b[1][y]; foo(&x, *y); : : instr n-1 instr n if(>) { [] = [][]; (&, *); : : tokenised n-1 tokenised n 98592 112142 147716 : : complex n-1 complex n Detection SystemComplexity Numbers • Tokenise Code. • Generate Complexity Numbers.
Detection SystemComparing Complexity Numbers • Determine the percentage of numbers common between two programs.
Submission System • Used for a number of years in parallel with the detection system. • A formative assessment tool. • Runs students programs with a suite of tests. • Analyses their code for poor programming practices. • The students can use the results from the tests to refine their assignments and re-submit as often as they like. • The submission system becomes a development environment.
Combining the SystemsOverview • Extract information from the detection system to create a digital fingerprint of an assignment. • The fingerprint helps to uniquely identify a piece of code while being unaffected to by minor changes to the code. • Append the fingerprint, along with time and date, to a log of submissions for each student. • Analyse logs to see if fingerprints are appearing between students and use the date/time to determine order of development.
Combining the SystemsDigital Fingerprints • A fingerprint is created by extracting the 6 largest, unique complexity numbers from all the numbers a piece of code generates. • Represent the 6 most complicated pieces of the code. Assignment Code Complexity Numbers Digital Fingerprint = 6 largest unique complexity numbers in sorted order if (x > y) x = x * 6; else y = x + y; : : : : *z = a->b[x]; 62145 87219 14067 57063 : : : : 112103 68018 68682 72172 87219 97843 112103 Append fingerprint and date/time to log
Combining the SystemsSubmission Logs 1 Changes 4 1
Combining the SystemsComparing Logs • Comparing summary of logs. • Time frames in comparison makes it clear who originated the code, who copied and when.
Who Copied Who?Results • Rarely is there collaboration. It is students copying other students. • In cases of copying, the logs almost always make a very clear statement of what has happened and when. • The copying usually involves one copying off another, sometimes two but rarely more. • Frequently, it is not the final submission that gives away the copying, but earlier submissions. This can be seen in the logs and then examining the earlier submissions.
Who Copied Who?Conclusions • The system has proved extremely successful in presenting misconduct cases to the Faculty. • The sheer weight of evidence the logs produce often saves time as students don’t try and bluff their way through the allegation. • This allows the Faculty to shift the focus away from penalty and to remedial action.