260 likes | 461 Views
Laurence Hellyer and Lawrence Beadle. Detecting Plagiarism in Microsoft Excel Assignments. Typical Excel Assignments. Loan Repayment Pension Calculator Annuity Calculator. A Familiar Problem. Plagiarising an Excel Assignment. Plagiarism: The action of taking someone else's work.
E N D
Laurence Hellyer and Lawrence Beadle Detecting Plagiarism in Microsoft Excel Assignments
HEA ICS 10th Annual Conference 2009 Typical Excel Assignments • Loan Repayment • Pension Calculator • Annuity Calculator
A Familiar Problem HEA ICS 10th Annual Conference 2009
HEA ICS 10th Annual Conference 2009 Plagiarising an Excel Assignment • Plagiarism: The action of taking someone else's work • Text Cells • Formula Cells • Charts • Numeric Cells – these are often specified by the assignment (i.e. assume an interest rate of 18%)
HEA ICS 10th Annual Conference 2009 Objective • Develop and use an automated tool to assist markers in detecting intra and inter group plagiarism within Microsoft Excel assignments.
Case Study Suspected Plagiarism Detected by Human Markers
HEA ICS 10th Annual Conference 2009 Existing Solutions? • Similar tools exist for different contexts • TurnItIn • Moss
HEA ICS 10th Annual Conference 2009 Human Markers Detecting Plagiarism • Microsoft Excel files can save meta-data about the file: • Author • Last saved by • Creation time • Last modification time • Registered Company
HEA ICS 10th Annual Conference 2009 Human Markers Detecting Plagiarism (Sometimes) • Microsoft Excel files can save meta-data about the file: • Author • Last saved by • Creation time • Last modification time • Registered Company
HEA ICS 10th Annual Conference 2009 Presenting ExcelSmash… • ExcelSmash is our software tool to highlight submissions requiring further scrutiny • It conducts the almost all the tests human markers can conduct
HEA ICS 10th Annual Conference 2009 Usage • Analyses 400 students in < 2 minutes • Output rapidly identifies submissions with similar content
HEA ICS 10th Annual Conference 2009 Data Used by ExcelSmash… Submission server Student username Author, Last saved by, Creation and modification time, Company Name Strings found in Text cells Strings representing formulas found in Formula cells Excel 97-2003, 2007
HEA ICS 10th Annual Conference 2009 Analysing Submissions • Pair wise comparisons of submissions • 80,000 comparisons for 400 submissions • Individual tests on each submissions • If a submission fails a test we add a “red flag” to the submission • Each test has an associated severity score • Only report submissions that exceed a run-time specified threshold
HEA ICS 10th Annual Conference 2009 Pair Wise Comparisons
HEA ICS 10th Annual Conference 2009 Individual File Tests
HEA ICS 10th Annual Conference 2009 Example Output Login: aaaa --- Severity: 7 Author match “Andrew” with: bbbb --- Severity: 5 Author “Andrew” and last saved by “aaaa” mis-match --- Severity: 2 Login: cccc --- Severity: 23 Similar creation time to dddd --- Severity: 1 Similar creation time to eeee --- Severity: 1 Similar creation time to ffff --- Severity: 1 100% similar text to ffff --- Severity: 10 100% similar formula to ffff --- Severity: 10
HEA ICS 10th Annual Conference 2009 Example Plagiarism Detected by ExcelSmash
HEA ICS 10th Annual Conference 2009 Text Matching • Case insensitive string equality Please Enter Your Annual Salary Annual Salary Please Enter Your Annual Salary Please enter your annual salary
HEA ICS 10th Annual Conference 2009 Percentage Similar Content
HEA ICS 10th Annual Conference 2009 Formula Matching • Case insensitive string equality =AVERAGE(H1:H10)*100 =100*AVERAGE(H1:H10) =SUM(A1:D4) =SUM(A2:D5)
HEA ICS 10th Annual Conference 2009 Percentage Similar Content
HEA ICS 10th Annual Conference 2009 Case Study
HEA ICS 10th Annual Conference 2009 Case Study Suspected Plagiarism Detected in 2007-08 Cohort (382 students)
HEA ICS 10th Annual Conference 2009 ExcelSmash Conclusions • New class of tool aimed at detecting possible plagiarism within Microsoft Excel assignments • Quickly identifies submissions requiring further scrutiny • Improved detection of intra group and especially intergroupplagiarism compared to human markers
HEA ICS 10th Annual Conference 2009 Further Work • Make code available to academics • Current formula comparison algorithm is easy to circumvent • Tokenise formulas before comparisons to remove dependence on absolute cell references • Avoid warnings for common author names • Add warning if metadata is stripped
HEA ICS 10th Annual Conference 2009 Thank you Questions? www.cs.kent.ac.uk/~lh243