1 / 22

A Human-Computer Collaboration Approach to Improve Accuracy of an Automated English Scoring System

A Human-Computer Collaboration Approach to Improve Accuracy of an Automated English Scoring System. NAACL-HLT 2010 June 5, 2010 Jee Eun Kim (HUFS) & Kong Joo Lee (CNU ). Outline. Overview of the system Issue Redundant errors Solution Introducing method to determine redundant errors

ayala
Download Presentation

A Human-Computer Collaboration Approach to Improve Accuracy of an Automated English Scoring System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Human-Computer Collaboration Approach to Improve Accuracy of an Automated English Scoring System NAACL-HLT 2010 June 5, 2010 Jee Eun Kim (HUFS) & Kong Joo Lee (CNU)

  2. Outline • Overview of the system • Issue • Redundant errors • Solution • Introducing method to determine redundant errors • Evaluation • Conclusion NAACL-HLT2010

  3. automated scoring system question database Teacher Input: She play footboll. scoring result score: 3 points out of 6 jerror in number agreement(play  plays|played) kmisspelling (footboll football) ltense mismatching (play  played) mmissing elements “after school” Student Question: 그녀는 방과 후에 축구를 했다. Correct answers: She played soccer after school. She played soccer after school is over. Procedure of Automated Scoring System feedback NAACL-HLT2010

  4. Automated English Scoring System • Scoring a single sentence not an essay • Target users • Junior high school students learning English as a second language • Calculating a score based on • the number of errors • the types of errors NAACL-HLT2010

  5. System Overview a scoring result & diagnostic feedback inter-sentential error detection module comparing sentences & calculating similarity mapping errors dependency structures dependency structures lexical information & syntactic rules & synonyms lexicon lexicon intra-sentential error detection module syntactic analyzer syntactic errors morphological analyzer word errors a student’s answer a set of correct answers NAACL-HLT2010

  6. Errors • 76 error types to be detected by the system • 16 word errors  morphological analyzer • 46 syntactic errors  syntactic analyzer • 14 mapping errors  comparing sentences • Error Reporting • She is too week to carry the bag. ERROR_ID |ERROR_POSITION |ERROR_CORRECTION_INFO e.g., CONFUSABLE_WORD_EROR | 4 | weak NAACL-HLT2010

  7. Issue Correct Answer: She is too weak to carry the bag. Student Answer: She is too weak to carry the her bag.  Teacher’s assessment : ‘her’ has to be omitted • A single error has been detected • Error detection result produced by the system  Syntactic processing phase EXTRA_DET_ERROR | 7-9 | UNNECESSARY_NODE_ERROR | 8 | (her)  Mapping processing phase • System’s assessment: treated them as two distinctive errors NAACL-HLT2010

  8. Error Example Correct Answer: She is a teacher who came to our school last week. Student Answer: She is a teacher who come school last weak.  One of the errors has to be removed!!! NAACL-HLT2010

  9. Redundant Errors • A pair of errors is determined as redundant errors if • they satisfy the following 3 conditions all together • COND1: Sharing an error position • COND2: Detected from different process phases • COND3: Dealing with the same linguistic phenomenon • Objectives • To remove one of the redundant errors • To improve the accuracy of the system NAACL-HLT2010

  10. Deciding Redundant Errors 14,892 sentences with errors detected by the system Filtering by Cond #1 & #2 150,419 pairs of errors 657 pairs of error ID Filtering by PMI & RFC 29,588 pairs of errors 111 pairs of error ID Filtering by human experts 20 pairs of error ID 47 pairs of error ID 44 pairs of error ID Deciding by Decision Tree redundant redundantor non-redundant non-redundant NAACL-HLT2010

  11. Deciding Redundant Errors (1) • Filtering by COND #1 & #2 • Input • 14,892 task-takers’ sentences scored by the system • All the possible pairs of errors which could occur in a sentence • Output • 150,419 pairs of errors were filtered • 657 pairs of error ID COND1: Sharing an error position COND2: Detected from different process phases ERROR_ID |ERROR_POSITION |ERROR_CORRECTION

  12. Deciding Redundant Errors (2) • Filtering using threshold of PMI & RFC[Su et al, 1994] • Input • 657 pairs of error ID from the previous step • Pointwise Mutual Information (PMI) • Relative Frequency Count (RFC) • Filtering • Output • 111 pairs of error ID NAACL-HLT2010

  13. Deciding Redundant Errors (3) • Filtering by human experts • Background of the experts • Junior high school English teachers • With Linguistics knowledge • With teaching experiences of 10 years or more • Input • 111 pairs of error ID • Output • Categorized errors into 3 classes NAACL-HLT2010

  14. Deciding Redundant Errors (4) • 3 error classes NAACL-HLT2010

  15. Deciding Redundant Errors (5) • For 44 “yet to be decided” pairs • Need additional information to determine if they are redundant or not • Using Decision Tree • Extracting decision rules NAACL-HLT2010

  16. Deciding Redundant Errors (6) • Features for decision tree learning • For a pair of errors (E1, E2) NAACL-HLT2010

  17. Examples of Decision Rules NAACL-HLT2010

  18. Evaluation • Scoring 200 unseen student-sentences by the system • Overall system’s performance • 2.6% improved… • Reducing a gap between human scoring and machine scoring 20 pairs of error ID 47 pairs of error ID 44 pairs of error ID Deciding by Decision Tree redundant redundantor non-redundant non-redundant NAACL-HLT2010

  19. Conclusion • Improvement was achieved by collaborating with human experts • Overall accuracy of the system has been improved NAACL-HLT2010

  20. Thank you! NAACL-HLT2010

  21. Cannot be decided yet NAACL-HLT2010

  22. Cannot be decided yet (cont’d) NAACL-HLT2010

More Related