1 / 24

Plagiarism Monitoring and Detection -- Towards an Open Discussion

Plagiarism Monitoring and Detection -- Towards an Open Discussion. Edward L. Jones Computer Information Sciences Florida A & M University Tallahassee, Florida. Outline. What is Plagiarism, and Why Address It Plagiarism Detection & Countermeasures A Metrics-Based Detection Approach

aglaia
Download Presentation

Plagiarism Monitoring and Detection -- Towards an Open Discussion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Plagiarism Monitoring and Detection -- Towards an Open Discussion Edward L. Jones Computer Information Sciences Florida A & M University Tallahassee, Florida

  2. Outline • What is Plagiarism, and Why Address It • Plagiarism Detection & Countermeasures • A Metrics-Based Detection Approach • Extending the Approach • Conclusions & Future Work

  3. Why Tackle Plagiarism? • Plagiarism undermines educational objectives • Failure to address sends wrong message • A non-contrived ethical issue in computing • Plagiarism is hard to define • Plagiarism is costly to pursue/prosecute • An interesting problem for tinkering

  4. What is Plagiarism? • “use of another’s ideas, writings or inventions as one’s own” (Oxford American Dictionary, 1980) • Shades of Gray • Theft of work • Gift of work • Collusion • Collaboration • Coincidence • Intent to Deceive

  5. How is it Detected? • By chance • Anomalies • Temporal proximity when grading • Automation methods • Direct text comparison (Unix diff) • Lexical pattern recognition • Structural pattern recognition • Numeric profiling

  6. Plagiarism Concealment Tactics • None • Change comments • Change formatting • Rename identifiers • Change data types • Reorder blocks • Reorder statements • Reorder expressions • Superfluous code • Alternative control structures

  7. Prosecution -- DA in the House? • Course syllabus broaches the subject • Concrete definition generally lacking • Sense of “we’ll know it when we see it” • N? Tolererance Policy • Investigation Stage • Prosecution Stage • Missed opportunity to teach?

  8. An Awareness Approach • Monitor closeness of student programs • Objective measures • Automated • Post anonymous closeness results in public • Nonconfrontational awareness • “A word to the wise … “ • Benchmark student behavior • Establishing thresholds • Effects of course, language

  9. Closeness Measures -- Physical Program 1 ( lines1, words1, characters1 Euclidean Distance ( lines2, words2, characters2) Program 2

  10. Closeness Measures -- Halstead Program 1 ( length1, vocabulary1, volume1) Euclidean Distance ( length2, vocabulary2, volume2) Program 2

  11. Comparison of Measures • Physical profile ==> weight test • Simple/cheap to compute (Unix wc command) • Sensitive to character variations • Halstead profile ==> content test • More complex/expensive to compute • Ignores comments and white space • Sensitive only to changes in program content • Detection effectiveness vs. plagiarism tactic

  12. Closeness Computation • Normalization • Establish upper bound for comparison (1.414) • Distance computed on normalized (unit) vectors • Normalization I -- Self normalization • p = (a, b, c) ==> (a/L, b/L, c/L) • Largest component dominates • Normalization II -- Global scaling • p = (a, b, c) ==> q = (a/aMAX, b/bMAX, c/cMAX) • Self normalization applied to q

  13. Distribution Of Closeness Values

  14. Comparison of Profiles

  15. Closeness values vary by assignment Programming language may lead clustering at the lower end of the spectrum Reuse of modules leads to cluster ingat the lower end of the spectrum No a priori threshold pin-pointing plagiarism All measures exhibit these behaviors Closeness Distribution

  16. Suspect Identification Collaboration Suspects (5-th Percentile) Rank Closeness student1 student2 1 0.00000000 alpha alpha 2 0.00000652 alpha beta 3 0.00026963 beta gamma 4 0.00026981 alpha gamma 5 0.00031262 gamma epsilon 6 0.00048815 sigma delta 7 0.00049825 alpha epsilon 8 0.00050169 beta epsilon 9 0.00066481 gamma theta 10 0.00073158 beta theta

  17. Independence Index Student Independence Indices Index student1 1 alpha 2 beta 3 gamma 5 epsilon 6 sigma 6 delta 9 theta Index = position at which student debuts on Closeness List

  18. Preponderance of Evidence • Historical Record of Student Behavior • Collaboration/partnering • Independence indices • Profile and analyze other artifacts • Compilation logs • Execution logs

  19. Another Approach • Make student demonstrate familiarity with submitted program • Seed errors into program • Time limit for removing error and resubmitting • Holistic approach • Intentional, not accidental

  20. Conclusions • We can do something about plagiarism -- the first step is to develop eyes and ears • Simple metrics appear to be adequate • Tools are essential • Sophistication is not as necessary as automation • Students are curious to know how they compare with other students

  21. On-Going & Future Work • Complete the toolset • Student Independence Index • Incorporate other Artifacts • Compilation logs • Execution logs • Integrate into Automated Grading • Disseminate Results • Package tool as shareware

  22. Questions? Questions? Questions?

  23. Thank You

  24. Flow Chart Student Programs Profile Compute Closeness Suspicious Programs

More Related