Graphical Information on Plagiarism Activates

Graphical Information on Plagiarism Activates Poon Yan Horn Jonathan

Table of Content • Background • Motivation • System structure • Pair-wise detection • Clustering • Demo • Q & A

Background • In spite of years of effort, plagiarism in student assignment submissions still causes considerable difficulties for course designers. • CAI (June 2005) – 40% of students admitted to engaging in plagiarism. • NUS FASS (AY 2008 – 2009) – 70 students were found guilty in committing plagiarism.

Motivation • There are many detection systems can detect the similarities between submissions for an assignment. • The results, however, do not provide sufficient information on how program code is being exchanged among a group of students. • Most importantly, how does plagiarism works within a group of students throughout all assignments.

System Structure Pair-wise plagiarism detection engine Clustering engine (DBSCAN) HTML / Graph generator Database

Pair-wise Detection • Tokenize each submission. • Construct N-Gram representation for each submission • Determine the sub-sequence pairs of N-Grams between each submission. • Compute asymmetric similarities among each submission.

Pair-wise Detection • Tokenize each submission • Removing whitespaces • Converting: • Keywords => ‘K’ • Identifiers => ‘V’ • Strings => ‘S’ • Constants => ‘C’ int main() { int a = 1; String b = “sb”;} KV(){KV=C;KV=S;}

Pair-wise Detection • N-Gram construction • Compose sequence of 4-gram tokens KV(){KV=C;KV=S;} KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=S V=S; =S;}

Pair-wise Detection • Determine the sub-sequence pairs between 2 sequences of N-Gram, A and B: • Check if each N-Gram in A can be found in B. • If a matched sub-sequence is longer than a minimum matching requirement, report this as a match. • A minimum matching requirement is 2 statements. KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=S V=S; =S;K S;KV ;KV= KV=C V=C; =C;} KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;} KV() V(){ (){K ){KV {KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;K C;KV ;KV= KV=C V=C; =C;}

Pair-wise Detection • Compute the asymmetric similarity for File f1 to File f2

Clustering • DBSCAN • Advantages • Fast Algorithm (O(n log n)) • Number of Clusters is automatically determined • Node (submitter) is classified as noise and omitted if in low density regions (not quite similar to other submitters) • Two properties • Eps – User defined grouping criteria base • MinPts – System predefined as 2

Demo

Q & A

Thank you

Graphical Information on Plagiarism Activates

Graphical Information on Plagiarism Activates

Presentation Transcript

Plagiarism

Suggestions on Preventing Plagiarism

Plagiarism

Plagiarism

Plagiarism

Plagiarism

Plagiarism

Plagiarism:

Plagiarism

Plagiarism

Plagiarism

Plagiarism

Exact Inference on Graphical Models

Customizing Students Learning Activates

Graphical Displays of Information

Plagiarism

Plagiarism Plagiarism Plagiarism Plagiarism

Notes on Graphical Models

Plagiarism

Plagiarism

Plagiarism:

Web Information Extraction Learning based on Probabilistic Graphical Models