1 / 32

Assessment and Checking of SQL: Detecting and Preventing Plagiarism

Assessment and Checking of SQL: Detecting and Preventing Plagiarism. Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005, Sunderland, UK. Introduction. We have an ILE for teaching Database modules. Part of this system teaches SQL using an interactive system.

veda-briggs
Download Presentation

Assessment and Checking of SQL: Detecting and Preventing Plagiarism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessment and Checking of SQL: Detecting and Preventing Plagiarism Dr. Gordon Russell, Andrew Cumming. Napier University, Edinburgh, Scotland. TLAD 2005, Sunderland, UK.

  2. Introduction • We have an ILE for teaching Database modules. • Part of this system teaches SQL using an interactive system. • It allows users to: • Follow about 70 problem-solving exercises. • Each exercise needs SQL statements to solve them. • The student can enter and run SQL statements. • The SQL is automatically assessed and feedback given. • Students use this feedback to learn • The site also offers incremental assessments to students, with automatic feeback.

  3. Tutorial Index

  4. Accuracy • The accuracy measure used is, put simply, how similar is the output of executing the student’s SQL to the SQL I wrote as the sample solution. • The basic algorithm takes the table produced by the student’s SQL statement, and then compares each cell of that table against the sample solution. • If the cell is in both tables then score+1 • If the cell is not in the sample solution score-1 • Divide the final score by the number of cells in the biggest of the sample solution table and the student’s SQL table.

  5. Example: Accuracy 4/9 or 45% Student’s Attempt Correct Answer SELECT * FROM people SELECT ID, Lastname FROM people WHERE ID IN (1,7)

  6. Assessments • When students complete a “tutorial” they can take an assessment. • There are 4 assessments to complete. • Each assessment has 2 questions. • The assessments can be taken at any time, and thus are unsupervised. • It is important that we are confident that assessment results are measuring that student’s SQL ability…

  7. System Evolution • Many changes have been made to the system to • Improve student results • Manage a lack of student attendance at tutorials • Keep good student interested • Give an impression of one-to-one attention • Removing the need to plagiarise

  8. Why Plagiarise • An online environment suffers from a number of issues: • Effort: “I am just cheating a machine” • Instant feedback: “It doesn’t say 100% yet” • Peers: “I got 100%, what did you get?” • Time: “Not enough time left to do it myself”

  9. What is plagiarism? • Here is simply refers to two or more students who have submitted SQL which has a common electronic source. • An emailed solution • Shared answer then distributed • Some plagiarists have verbatim copies, while others spend time disguising the copies.

  10. Detecting Plagiarism • We have written a number of algorithms to detect similarity between SQL statements. • Keeping algorithms separate aids understanding. • Combining the results of 2 algorithms is problematic. • So detection of likely plagiarists is done in first phase, result in an XML report. • XML is processed and analysis made in phase 2. • Phase 2 produces “high confidence” plagiarist identification.

  11. Equality Testing • EQUALITY was our first detection algorithm • It looks at the SQL as a single string, and uses string equality as the test. • It returns a number, with 10 being “they are identical”, down to (currently) 4 “identical ignoring spaces, brackets, and case”. • This test usually is all that’s needed. • It really detects “copy and paste” electronic copying.

  12. Shuffle Match • SHUFFLE is our “intelligent” matcher. • It ignores all student-generated formatting. • Reformats SQL using a standard format. • It tolerates switching lines around. • It tolerates moving operands around an operator. WHERE A = B WHERE B = A

  13. Histogram Analysis • There are only so many ways of writing SQL. • Students argue that copies are just co-incidence. • HISTOGRAM produces evidence to help stop this argument. • It detects things irrelevant to the SQL standard. • Trailing spaces at the end of a line • Use of spaces around commas and operators • Different cases of operands or clauses

  14. Histogram SELECT a,b FROM c where a=(select b from hia); WHERE a= b AND c= d CREATE View unique123

  15. <equality quality='9' /> select distinct(d.d_name)from dressmaker d, construction c, material m, \_ dress_order o, order_line olwhere d.d_no = c.maker and c.order_ref = ol.order_ref and c.line_ref = ol.line_no and ol.order_ref = o.order_no and ol.ol_material = m.material_no and c.finish_date is not null and m.fabric = 'Silk' and o.completed = 'Y' select distinct(d.d_name)from dressmaker d, construction c, material m, \_ dress_order o, order_line olwhere d.d_no = c.maker and c.order_ref = ol.order_ref and c.line_ref = ol.line_no and ol.order_ref = o.order_no and ol.ol_material = m.material_no and c.finish_date is not null and m.fabric = 'Silk' and o.completed = 'Y'

  16. <equality quality='8' /> <signature op='0' comma='1' trail='0' /> <histogram mode='all'> <word str='grm' left='3' right='3' /> <word str='mtr' left='2' right='2' /> <word str='cost' left='3' right='3' /> <word str='Cost' left='1' right='1' /> </histogram> SELECT distinct g.description, (g.labour_cost + q.quantity * m.cost) "Total \_ Cost" FROM garment g, material m, quantities q WHERE q.size_q = 8 AND m.fabric = 'Cotton' AND g.style_no = q.style_q AND (g.labour_cost + q.quantity * m.cost) > (\_ SELECT AVG(grm.labour_cost + (qty.quantity * mtr.cost)) FROM material mtr, quantities qty, garment grm WHERE qty.size_q = 8 AND grm.style_no = qty.style_q) SELECT distinct g.description, (g.labour_cost + q.quantity * m.cost) "Total \_ Cost" FROM garment g, material m, quantities q WHERE q.size_q = 8 AND m.fabric = 'Cotton' AND g.style_no = q.style_q AND (g.labour_cost + q.quantity * m.cost) > (\_ SELECT AVG(grm.labour_cost + (qty.quantity * mtr.cost)) FROM material mtr, quantities qty, garment grm WHERE qty.size_q = 8 AND grm.style_no = qty.style_q)

  17. <shuffle score='0.888888888888889' restarts="0" /> <signature op='0' comma='1' trail='1' /> <histogram mode='all'> <word str='labour_cost' left='2' right='2' /> <word str='TRAIL:1' left='2' right='2' /> <word str='LABOUR_COST' left='1' right='1' /> </histogram> select a.description, b.fabric, b.colour,\_ b.pattern from garment a, material b,quantities c, \_order_line d where a.style_no = c.style_qand c.style_q = d.ol_styleand d.ol_material = b.material_noand (a.labour_cost+c.quantity*b.cost) > (select g.labour_cost from garment g where g.LABOUR_COST > 80); select a.description, b.fabric, b.colour,\_ b.pattern from garment a, material b,quantities c, \_order_line d where a.style_no = c.style_qand c.style_q = d.ol_styleand d.ol_material = b.material_noand (a.labour_cost+c.quantity*b.cost) > (select g.labour_cost from garment g where g.LABOUR_COST >= 80);

  18. <shuffle score='0.863636363636364' restarts="0" /> <signature op='0' comma='0' trail='1' /><histogram mode='all'> <word str='TRAIL:1' left='1' right='1' /></histogram> select fabric,round(sum(quantity*cost*1.15),2) from order_line join quantities on(\_ ol_style=style_q and ol_size=size_q) join \_ material on (material_no=ol_material),\_ constructionwhere line_no=line_refand order_line.order_ref=construction.order_ref and FINISH_DATE between '01 jan 2002' and '31 dec 2002' group by fabric; select fabric,round(sum(quantity*cost*1.15),2)as constructionfrom order_linejoin quantities on(ol_style=style_q and ol_size=size_q) join material on (material_no=ol_material),\_ constructionwhere line_no=line_refand order_line.order_ref=construction.order_ref and FINISH_DATE between '01 jan 2002' and '31 dec 2002' group by fabric;

  19. Understanding Plagiarism • With random question banks it is hard to understand plagiarism. • Everyone doing the same questions it is easy! • Can they copy off of their lab buddy? • What about a friend of a friend? • Do students really have friends-of-friends-of-friends. • Increasing question bank size should help, but this is expensive, and maintaining equal question difficulty is hard. • Definitely we only want the minimum bank size which prevents plagiarism oppertunities.

  20. System in 2002 • Students did the assessments when they liked. • They had 15 questions, and did any 5 they wanted to. • Q1-5 12% each, Q6-10 16% each, Q11-15 20%. • Argument was that this would allow weak students to do minimum, and good students to shine, without anyone feeling they failed something.

  21. Collusion in 2002

  22. Moving to 2003 • Target: Eliminate big groups. • Switch to random banks. • Force student to do tutorials before assessments. • Have 4 assessments (1 per tutorial). • Each assessment has 2 questions. • Each student will use use 40% of the bank contents.

  23. Collusion in 2003

  24. Into 2004 • Target: Improve student moral • Students hated failing to get Ass 3 and 4. • Change difficulty level of questions slightly. • Now students get 1 question from previous difficulty level in every assessment. • Result: Students do more assessments!

  25. Collusion in 2004

  26. Summary • Plagiarism groups are back! • However, the nodes are quite far apart. • Big blob is an off-campus “support group”. • 2nd biggest is a local support network. • Mostly each student copied only from 1 other. • A slightly bigger bank size may fix this. • We are going to use a completely new bank in 2005/6 to analyse the impact of answer sharing between year groups.

  27. Statistics

  28. Conclusion • From the table plagiarism is actually holding steady. • Student marks have increased. • Question attempts have increased too. • This data covers semester 1. In semester 2, plagiarism has decreased as a result of students realising the effectiveness of the new plagiarism detector. • 2005/6 will be the big test.

More Related