  1. First and Second LanguageModels to Correct PrepositionErrors Matthieu Hermet, Alain Désilets National Research Council of Canada

  2. PrepositionErrors • A good case study: • High errorrate • More than 17% of errors in ourdataset • Instance of function-worderrors, correctibleusing corpus-basedmethods • Instance of interferenceerrors

  3. PrepositionErrors • 2 major causes: • Confusion withpreposition of the samesemantic class …à la conférence NAACL …at the NAACL conference …in the NAACL conference • Interferencewith L1 Écouter les intervenants Listen to the speakers Listen the speakers

  4. Approaches • Rule-based: • Mal-rules: cost of manualcreation • Syntacticconstraint relaxation: parser-dependent • Corpus-based: • Languagemodels: lowcoverage • Web as a corpus: bettercoverage • Still not enough: lessthan 40% of our data set

  5. Approach • Interferenceerrorsmaybe hard to addressproperlythroughcorpus-basedmethods • Theyrepresent a model of L2 correctness  To deal withinterferenceerrors, itmaybeadvantageous to use a model whichtakes L1 intoaccount

  6. Roundtrip MT • carry out a single round-trip translation at the level of a clause or sentence • Use a phrase-based translation system  Google Translate

  7. Roundtrip MT Send to phrase-based translation system L1 (en): “Police arrived at the scene of the crime” To L1: Policemen arrived at the crime scene Back to L2: Les policiers sont arrivés sur les lieux du crime L2 (fr): “Les policiers sont arrivés à la scène de la crime.”

  8. Theory Les policiers sont arrivés à la scène du crime

  9. Drawback • The round-trip translatedsentence can show • A wrongtranslation N’hésitez pas de me contacter  s’il vous plait contactez moi • A correct translation that uses the wrongpreposition J’ai de la difficulté de formuler des phrases  je trouve difficile de formuler des phrases • A wrong translation that usesthe correct preposition […] demandé à mon amie pour le corriger […] demandé à mon amie de le fixer

  10. Assessment • Correctnesscantakeat least twoforms: • Correct translation • Wrong translation but correct preposition Twostrategies for evaluation: • Clause: the roundtrip translation is a good correction, includingpreposition • Prep: the prepositiononlyis correct in the roundtrip translation

  11. Assessment • In the Clausestrategy, the RT translation is sent back as the correction • In the Prepstrategy, weneed a procedure to retrieve the prepositionfrom the incorrect translation  The prepositiononlyis sent back as the correction

  12. Prep • greedy mining method to retrieve the preposition from the translation • Êtreprocheàlui êtreprèsdelui • The sequences <prepà> lui == <prepde> lui validates the preposition de as a correction

  13. Unilingual • An instance of a corpus-basedapproach • Web as a probabilisticlanguage-model • Strength of an utterancemeasured in number of search hits • Practically the Web’scoverageisincomplete • Impossible to discriminatewhenzerohits are returned for all alternatives  Syntacticpruning to maximize chances of hits

  14. Pruning 1 • Sentence isparsed and reduced to a phrasalminimum around the preposition • S  VP or NP (or AP) I have lived in a smalltown all my life  lived in a smalltown I’llget a chance to meet people a chance to meet • Words are lemmatized • Verbs to Infinitive • Nouns to singular

  15. Pruning 2 • Suppressunnecessarywords • Adj, whenattributive: To live in a smalltown To live in a town This iseasy to understandeasyto understand • Adv, in all cases Call immediately for help  call for help • NP or PP Une fenêtre qui permet au soleil d’entrer … qui permet d’entrer … au soleil d’entrer

  16. Alternateprepositions • Once pruned, replace the erroneouspreposition by alternates • Most commonprepositions • De, sur, avec, par, pour, à • Prepositions of the samesemantic class • Localization, temporal, cause, goal, manner, material, possession • 1 input sentence = as many sentences as there are alternateprepositions

  17. Preposition Categories

  18. Unilingual • Input Sentence Il y a une grande fenêtre qui permet au soleil <à> entrer (there is a large window which lets the sun come in) • Syntactic Pruning and Lemmatization permettre<à> entrer + au soleil <à> entrer (let come in) (the sun come in) • Generation of alternate prepositions • semanticallyrelated: dans, en, chez, sur, sous, au, dans, après, avant, en, vers • mostcommon: de, avec, par, pour • Query and sort alternative phrases permettre d'entrer: 119 000 hits au soleil d’entrer: 397 hits permettre avant entrer: 12 hits au soleil avant entrer: 0 hits permettre à entrer: 4 hits … permettre en entrer: 2 hits ... • → preposition <d'> is returned as correction

  19. Results • Dataset: 133 sentences extractedfromintermediate-advanced FSL productions • Unilingualreturns hits in only~85% of cases • Impact of L1 on L2 inputs • Incompleteness of the Web as a language model

  20. Hybrid • Agreement between the two strategies is only 65.4% • A thirdstrategy to combine the twomodels • MT as a model of controlled incorrectness (here, anglicisms) • Web as a model of correctness

  21. Hybrid • Triggered when the unilingual approach does not give any hits  Then send to roundtrip MT - prep • Yields results of 82%

  22. Conclusion and Future Work • Unilingual and roundtrip MT equivalent • Hybridapproachseemsrelevant due to the differentparadigms of the twoapproaches • More Data • Enhancepruning • Study in the context of errordetection • Extend MT approach to othererror classes

