220 likes | 393 Views
First and Second Language Models to Correct Preposition Errors. Matthieu Hermet, Alain Désilets National Research Council of Canada. Preposition Errors. A good case study : High error rate More than 17% of errors in our dataset
E N D
First and Second LanguageModels to Correct PrepositionErrors Matthieu Hermet, Alain Désilets National Research Council of Canada
PrepositionErrors • A good case study: • High errorrate • More than 17% of errors in ourdataset • Instance of function-worderrors, correctibleusing corpus-basedmethods • Instance of interferenceerrors
PrepositionErrors • 2 major causes: • Confusion withpreposition of the samesemantic class …à la conférence NAACL …at the NAACL conference …in the NAACL conference • Interferencewith L1 Écouter les intervenants Listen to the speakers Listen the speakers
Approaches • Rule-based: • Mal-rules: cost of manualcreation • Syntacticconstraint relaxation: parser-dependent • Corpus-based: • Languagemodels: lowcoverage • Web as a corpus: bettercoverage • Still not enough: lessthan 40% of our data set
Approach • Interferenceerrorsmaybe hard to addressproperlythroughcorpus-basedmethods • Theyrepresent a model of L2 correctness To deal withinterferenceerrors, itmaybeadvantageous to use a model whichtakes L1 intoaccount
Roundtrip MT • carry out a single round-trip translation at the level of a clause or sentence • Use a phrase-based translation system Google Translate
Roundtrip MT Send to phrase-based translation system L1 (en): “Police arrived at the scene of the crime” To L1: Policemen arrived at the crime scene Back to L2: Les policiers sont arrivés sur les lieux du crime L2 (fr): “Les policiers sont arrivés à la scène de la crime.”
Theory Les policiers sont arrivés à la scène du crime
Drawback • The round-trip translatedsentence can show • A wrongtranslation N’hésitez pas de me contacter s’il vous plait contactez moi • A correct translation that uses the wrongpreposition J’ai de la difficulté de formuler des phrases je trouve difficile de formuler des phrases • A wrong translation that usesthe correct preposition […] demandé à mon amie pour le corriger […] demandé à mon amie de le fixer
Assessment • Correctnesscantakeat least twoforms: • Correct translation • Wrong translation but correct preposition Twostrategies for evaluation: • Clause: the roundtrip translation is a good correction, includingpreposition • Prep: the prepositiononlyis correct in the roundtrip translation
Assessment • In the Clausestrategy, the RT translation is sent back as the correction • In the Prepstrategy, weneed a procedure to retrieve the prepositionfrom the incorrect translation The prepositiononlyis sent back as the correction
Prep • greedy mining method to retrieve the preposition from the translation • Êtreprocheàlui êtreprèsdelui • The sequences <prepà> lui == <prepde> lui validates the preposition de as a correction
Unilingual • An instance of a corpus-basedapproach • Web as a probabilisticlanguage-model • Strength of an utterancemeasured in number of search hits • Practically the Web’scoverageisincomplete • Impossible to discriminatewhenzerohits are returned for all alternatives Syntacticpruning to maximize chances of hits
Pruning 1 • Sentence isparsed and reduced to a phrasalminimum around the preposition • S VP or NP (or AP) I have lived in a smalltown all my life lived in a smalltown I’llget a chance to meet people a chance to meet • Words are lemmatized • Verbs to Infinitive • Nouns to singular
Pruning 2 • Suppressunnecessarywords • Adj, whenattributive: To live in a smalltown To live in a town This iseasy to understandeasyto understand • Adv, in all cases Call immediately for help call for help • NP or PP Une fenêtre qui permet au soleil d’entrer … qui permet d’entrer … au soleil d’entrer
Alternateprepositions • Once pruned, replace the erroneouspreposition by alternates • Most commonprepositions • De, sur, avec, par, pour, à • Prepositions of the samesemantic class • Localization, temporal, cause, goal, manner, material, possession • 1 input sentence = as many sentences as there are alternateprepositions
Unilingual • Input Sentence Il y a une grande fenêtre qui permet au soleil <à> entrer (there is a large window which lets the sun come in) • Syntactic Pruning and Lemmatization permettre<à> entrer + au soleil <à> entrer (let come in) (the sun come in) • Generation of alternate prepositions • semanticallyrelated: dans, en, chez, sur, sous, au, dans, après, avant, en, vers • mostcommon: de, avec, par, pour • Query and sort alternative phrases permettre d'entrer: 119 000 hits au soleil d’entrer: 397 hits permettre avant entrer: 12 hits au soleil avant entrer: 0 hits permettre à entrer: 4 hits … permettre en entrer: 2 hits ... • → preposition <d'> is returned as correction
Results • Dataset: 133 sentences extractedfromintermediate-advanced FSL productions • Unilingualreturns hits in only~85% of cases • Impact of L1 on L2 inputs • Incompleteness of the Web as a language model
Hybrid • Agreement between the two strategies is only 65.4% • A thirdstrategy to combine the twomodels • MT as a model of controlled incorrectness (here, anglicisms) • Web as a model of correctness
Hybrid • Triggered when the unilingual approach does not give any hits Then send to roundtrip MT - prep • Yields results of 82%
Conclusion and Future Work • Unilingual and roundtrip MT equivalent • Hybridapproachseemsrelevant due to the differentparadigms of the twoapproaches • More Data • Enhancepruning • Study in the context of errordetection • Extend MT approach to othererror classes