10 likes | 193 Views
Generating Confusion Sets for Context-Sensitive Error Correction Alla Rozovskaya and Dan Roth {rozovska, danr}@illinois.edu. Problem: Multi-class Classification with a Very Large Number of Classes. Our Approach – Narrow down the Candidates. Preposition Errors and ESL.
E N D
Generating Confusion Sets for Context-Sensitive Error Correction Alla Rozovskaya and Dan Roth {rozovska, danr}@illinois.edu Problem: Multi-class Classification with a Very Large Number of Classes Our Approach – Narrow down the Candidates Preposition Errors and ESL • To narrow down the candidates, we need knowledge about which prepositions can serve as valid corrections. • Narrow down candidates by L1 (writer’s first language). • “They ateby* their hands .” • (1) Standard Conf. sets – 10 correction candidates for by: {about, on, p3, …p10}. • (2) L1-dependent Conf. sets – exclude candidates not seen as corrections for by in the ESL texts: Russian {by, with, of}; Chinese {by, with, in}. • (3) L1-dependent Weighted Conf. sets – enhanced with probability for each cand. • Preposition errors are very common with ESL learners. • Preposition errors are influenced by L1. • Not all preposition confusions are equally likely (Han et al. ‘10, Rozovskaya & Roth ’10a). Preposition Usage Errors by English as a Second Language (ESL) learners: “They ate by* their hands.” The writer used by instead of with. The Annotated ESL Corpus Preposition Error Correction as a Multi-class Classification Problem • 63000 words of ESL writing, annotated • for article and preposition errors, other grammar and lexical errors (Rozovskaya & Roth ’10a). • Data from speakers of 9 first languages. • 4185 prepositions, 352 (8.4%) erroneous. • A multi-class classifier is trained; each class corresponds to a distinct preposition. • Usually, one considers 9-34 top English prepositions. Experiments • Models are trained on top 10 prepositions on native English data using the Averaged Perceptron Algorithm with LBJ (Rizzolo & Roth ’07). • (1) Standard confusion sets. • (2) L1-dependent confusion sets; Bad candidates excluded at decision time. • (3) L1-dependent Weighted confusion sets; Bad candidates excluded in training. • Artificial preposition errors are added in training, using error distributions of the speakers of L1 (Rozovskaya & Roth, ’10b). Confusion Sets for Preposition Error Correction • AConfusion Set (candidate set) for preposition pi – prepositions considered as corrections for pi. • Standard confusion sets -- every participating preposition is viewed as a valid correction for pi. (De Felice & Pulman ’08, Tetreault & Chodorow ’08, Gamon et al., ’08). Experimental Results • L1-dependent confusion sets are superior to the standard confusion sets. • On the same recall points, the models with restricted confusion sets obtain a consistently better precision. • Using knowledge about the likelihood of each preposition confusion (weighted confusion sets) is even more effective. (stat. signif. at p<0.001, using McNemar’s test). Preposition errors in the ESL data. Contributions • We propose to narrow down candidates instead of considering all possible classes. • We propose methods to narrow down candidates at decision time and in training. • We narrow down preposition correction candidatesusing knowledge about typical errors observed with writers whose first language is L1. Selected References M. Gamon, J. Gao, C. Brockett, A. Klementiev, W. Dolan, D. Belenko, and L. Vanderwende. 2008. Using contextual speller techniques and language modeling for ESL error correction. IJCNLP. N. Han, J. Tetreault, S. Lee, and J. Ha. 2010. Using an error annotated learner corpus to develop and ESL/EFL error correction System. LREC. A. Rozovskaya and D. Roth. 2010a. Annotating ESL errors: Challenges and rewards. NAACL-BEA workshop. A. Rozovskaya and D. Roth. 2010b. Training paradigms for correcting errors in grammar and usage. NAACL. This work is supported by a grant from the US department of Education.