300 likes | 420 Views
Correcting errors produced by French speakers writing in English:. An illustration with misplaced adverbs Workshop LORIA, Nancy 17-18 June 2010 . Marie Garnier Cultures Anglo-Saxonnes Université Toulouse 2 France. P. Saint-Dizier IRIT CNRS France. Introduction.
E N D
Correcting errors produced by French speakers writing in English: An illustration with misplaced adverbs Workshop LORIA, Nancy 17-18 June 2010 • Marie Garnier • Cultures Anglo-Saxonnes • Université Toulouse 2 • France • P. Saint-Dizier • IRIT • CNRS • France
Introduction • CorrecTools project • objective: develop correction rules for grammatical errors produced by French speakers writing in a foreign language, application to English (not detected nor corrected by grammar checkers) • didactic perspective: inclusion of dynamically generated explanations (grammar, several corrections, etc.) and possibly argumentation. • Possible extension to style. • First experiment: Errors linked to misplaced adverbs (adjuncts) • motivations for the correction of such errors • their automatic correction
Project Overview • Target: • French speakers • Audience: large-public as well as professionals • Exploratory corpus: • variety of types of documents, domains, authors • around 100.000 words (errors are manually detected and annotated) • Classification of errors: • A priori choice: system of categories based on linguistic criteria (NP, PP, VP, Clause and sentence) (Albert et al., 2009)
Parameters of the construction of a corpus General methodology • Construction of corpus: first stepof anerroranalysismethodology • Designed in accordance withour objective (representativity of errors and types of situations) • Parameterstakenintoconsideration: Level of control Type of document Authors and target audience Fields or domains of document production
Description of parameters • Type of documents and level of control: • From short spontaneous productions (e.g. emails, posts) to longer professional productions • Quasi-continuum fromlowlevelto highlevel of control • Emails, blogs = lowlevel of control, • Web pages = averagelevel of control • professional productions = highlevel of control • Variations existwithin groups • Around200 pages (90 pages of internet productions, 110 pages of professional productions, 100 000 words), 79 authors.
Constraints on the classification of errors Methods • Two main methods (Ellis, 2008): • Errorscategorizedaccording to linguisticcriteria (i.e. syntax/morphology/lexicon, parts of speech, linguisticsystemssuch as determination, expression of future, etc.) • Errorscategorizedaccording to the observation of surface phenomena (i.e. omission, addition, wrong use, etc.) • Possibility of ad hoc categories (study of a limitednumber of error types concerning a specific group of learners)
Constraints on our classification system • Categoriesshoulddescribemost types of errors (not ad hoc) • Categoriesshouldbedesignedaccording to linguisticcriteria (descriptions used to analyze the source of errors) • Categoriesshouldbeunderstood by mostannotators and users • Classification system should show internalcoherence (linguistic, cognitive) • Categoriescouldbe portable to otherlanguages
Presentation of ourerrorcategorization system • Main categories: syntactic phrases thatcontain the errors (NP, VP, PP, Sentence and Clause) • Internalcategories: finer distinctions designedafter observation of the nature of errors. Leads to about 40 subclasses. • Analyzereasons/source of errors for a better correction.
Errorcategories: a few examples Table 4.
Distribution of errors: a sample Table 5. Main types of errors in the corpus
About otherlanguage pairsSameremarksapply, but withquitedifferenterrorcategories:French Spanish(Mathilde Janier)ex: temporal agreement“éramos en los tiempos, nos vamos con destino a Lyón”→ éramos en los tiempos, nos fuimos con destino a Lyónfutur avec Cuando: “cuando seré más vieja”→ cuando sea más viejaSpanish English (Astrid Rojas)Realmenteesperoir el próximoaño Really I hope I can go there next year (I really hope…)Tengo 20 años I have 20 years.The grammarof pronouns and reflexives is quite different in Spanish, leading to forms such as David is me, a calque of David soy yo.French German(Camille Albert)Ichhabegern die Suppe Ichhabe die Suppegern.
Proposition of an annotation schema • Attempt to reflect the parametersinvolved in errordetection and correction made by human correctors • Annotations are in XML format • The aimis to derive correction rulesfrom annotations, possiblythrough machine-learning techniques
Error annotations: a preliminaryproposal Table 1. Delimitation and characterization of an error
Table 2. Delimitation and characterization of correction(s) NB: More complex schema than those used in other projects (ICLE and FreeText, NICT Japanese Learner English, Cambridge Learner Corpus) but purposes are very different.
Example of an annotated error with multiple corrections: *We need to index efficiently the soundtrack of multimedia documents Table 3. Example of an annotated error
The case of misplacedadverbs • Distribution and type of errors in the corpus • Responsesoffered by grammarcheckers • A correction strategy
Type of errors Table 6. Errors linked to adverbs Morphology: mostly prototypical –ly adverbs + simple or complex other adverbs (well, nevertheless...)
Grammar checkers • Frompayware to freeware, fromprofessionalwebsites to researchprojects: • Withthosesystems: Errorsamplesfrom corpus: best result = 19.3% Misplacedadverbs in the VP are in general not correctednordetected...
Error sources • Syntactic transfer (Ellis, 2008): • Adverb placed between main verb and complement (L1 influence) • Ex: *It won't change completely the life of its citizens • Generalization from exceptional cases: • In English, adverbs can be found after the verb when the complement is long (Huddleston and Pullum, 2002) or when there is no complement (intransitive VP) • Ex: She ate slowly. • Ex: She waited anxiously for the results of the exam she had had such a hard time preparing for.
Towardsautomatic correction • Grammatical and linguistic framework: • Descriptive grammar The Cambridge Grammar of the English Language, R. Huddleston & G. K. Pullum (2002) • Prescriptive grammar Grammaire Explicative de l'Anglais, P. Larreya & C. Rivière (2005) Overview of grammatical rules and tendencies governing adverb placement
Parametersinvolved in correction rules • Weight • Length of AdvP (long HeadAdv and/or modification) ? She would very erratically tell her story. • Presence/absence of complements after the verb + length ? She was slowly eating. She has slowly opened the door to the second guestroom. vs She has opened it slowly. • Semantics • Adjunct type (Manner, Degree, Act-related...) They deliberately had stopped the train. • Scope of the adverb (VP-oriented, Clause-oriented) Sadly they were arguing about the children. ? They were arguing sadly about the children.
Syntax • "Simple" verbs vs Prepositional verbs vs Phrasal verbs She has opened the door slowly. She has slowly given up cigarettes. • Prosody • Prosodically integrated vs prosodically detached *Anxiously she waited for the results. Anxiously, she waited for the results. (Other works on parameters of adverb placement include: Kampers-Manhe, 1994; Engels, 2004)
Tests with native speakers Table 7. Sample from NS tests (1) Grammatical but unnatural and/or changes original meaning (2) Ungrammatical
Error patterns and correction rules • Manneradverbsused as adjuncts (VP-oriented) • Ex: *Slowlyshe has opened the door. (1) Slowly, she has opened the door. (2) She has opened the doorslowly. (3) She has slowlyopened the door. • Correction: pattern for detection + Rewriting under conditions, withpreferences: Adverb(+manner), NP1, {Auxiliary}, Verb, NP2 → Adverb(+manner), [,] , NP1, {Auxiliary}, Verb, NP2 ,{preference: 1} → NP1, {Auxiliary}, Verb, NP2, Adverb(+manner) ,{preference: 2} → NP1, {Auxiliary}, Adverb(+manner),Verb, NP2 ,{preference: 3}
Ex: *She anxiously was waiting for the results. (1) She was anxiously waiting for the results. (2) She was waiting for the results anxiously. • Rewriting rule: NP1, Adverb(+manner), {Auxiliary}, Verb, NP2 → NP1, {Auxiliary}, Verb, NP2, Adverb(+manner), {preference: 1} → NP1, {Auxiliary}, Adverb(+manner), Verb, NP2, {preference: 2}
Difficulties: • Deal with the recognition of NPs • Possible interactions withotherfunctions of adverbs (ex: He loves onlyhiswork, focusing modifier) • Awaittesting and implementationusing <TextCoop> (software platform for the identification of textualsemantic structures): Evaluation of : Annotation of errors + correction proposals.
Perspectives • Furtherresearch on adverbs: • Otherfunctions, e.g. modifiers of adjectives and adverbs, focusingmodifiers (mightinteractwithexistingerror patterns) • Internalsyntax of AdvPs • Developexplanation aspects of the project: • Generate argumentations to deal with multiple correction propositions (Garnier et al., 2009) • Design dynamicallygeneratedexplanations for errorslinked to adverbs • Investigate cognitive aspects of error correction • Correction of NØN errors(ex: the meaningutterance) and other types of errors • Requiresknowledgefromdifferent areas (lexical, ontological, domainknowledge, etc.)
More information on: http://www.irit.fr/recherches/ILPL/webct/ct.html