250 likes | 356 Views
Correcting Misuse of Verb Forms. John Lee , Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge. ACL 2008. Outline. Introduction Background System Baselines Data Evaluation Conclusions. Introduction. Introduction.
E N D
Correcting Misuse of Verb Forms John Lee , Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008
Outline • Introduction • Background • System • Baselines • Data • Evaluation • Conclusions
Outline • Introduction • Background • System • Baselines • Data • Evaluation • Conclusions
Background The goal is to correct confusions among the five forms, as well as the infinitive caused by semantic and syntactic errors. Semantic Errors Suppose one wants to say “I am prepared for the exam”, but writes “I am preparing for the exam”.
Background Syntactic Errors Subject-Verb Agreement He *have been living there since June. Auxiliary Agreement He has been *live there since June. Complementation He wants*live there.
Outline • Introduction • Background • System • Baselines • Data • Evaluation • Conclusions
System Step1 Automatic Parsing “My father is *work in the laboratory.”
System Step2 Replacing the verb forms
System Step3 N-gram counts as a filter Using WEB 1T N-GRAM corpus. Prepared by Google Inc.
Outline • Introduction • Background • System • Baselines • Data • Evaluation • Conclusions
Baselines majority baseline No correction. verb-only baseline(Only used in Auxiliary Agreement & Complementation) It attempts corrections only when the word in question is actually tagged as a verb.
Outline • Introduction • Background • System • Baselines • Data • Evaluation • Conclusions
Data Development Data AQUAINT Corpus (English News Text) Evaluation Data JLE (Japanese Learners of English corpus) For 167 of the transcribed interviews, totalling 15,637 sentences. Test Set 477 sentences (3.1%) contain subject-verb agreement errors, and 238 (1.5%) contain auxiliary agreement and complementation errors
Data Evaluation Data HKUST (Hong Kong University of Science and Technology) It contains a total of 2556 sentences.
Data Evaluation Metric Accuracy (true neg + true pos) / total number of sentences Recall true pos / (true pos + false neg + inv pos) Detection Precision (true pos + inv pos) / (true pos + inv pos + false pos) Correction Precision true pos / (true pos + false pos + inv pos)
Outline • Introduction • Background • System • Baselines • Data • Evaluation • Conclusions
Evaluation JLE Results for Subject-Verb Agreement Results for Auxiliary Agreement & Complementation
Evaluation HKUST Results for Auxiliary Agreement & Complementation Two native speakers of English were given the edited sentences, as well as the original input. For each pair, they were asked to select one of four statements: one of the two is better, or both are equally correct, or both are equally incorrect. Kappa: 0.76
Outline • Introduction • Background • System • Baselines • Data • Evaluation • Conclusions
Conclusions • This paper proposes a method to correct English verb form • errors made by non-native speakers. • Investigation of the ways the ways in which verb form errors • affect parse trees.