230 likes | 255 Views
LanguageTool is a highly customizable, rule-based Java program for grammar checking. It contains xml and java rules, as well as a POS dictionary. It can be used online or locally.
E N D
LanguageTool 3 07-11-2017 David Ling
Contents • LanguageTool • Overview on rules • Web demonstration • Performances on students’ scripts • Rule xml syntax and customization • Token, exception, POSTag, skip • Example: Third person singular • Making a custom example: Math lessons use English • Java rules: neural network • Resolving a custom confusion pairs: causal, casual
LanguageTool • LanguageTool • Open source grammar checking Java program • Rule-based, highly customizable • Input features for the rules: POSTag, word pattern, Chunking-tag • To use your own LanguageTool, you can • double click ‘languagetool.jar’ • via windows command line prompt cmd • Run a local http server, connect via browsers • Online demo available: https://languagetool.org/
LanguageTool rules LanguageTool contains a POS dictionary Modal verb Noun Verb (base form) Verb (3rd person singular) Adjective • Two main kinds of rules • Xml rules • Java rules • Xml rules are customizable. Two corresponding files: • Disambiguation.xml for reducing multiple POSTags of a token, 346 rules • Grammar.xml for grammar rules, ~1700 rules
Grammar rules in grammar.xml Total: 1704
Rule examples (name and outcome) Grammar • all/most/some (of) + noun < correction="All students|All of the students">All of studentslike mathematics. • both... as well as (and) < correction="and">He is both very rich as well ashandsome. • Use of past form with 'going to ...' < correction="write">I'm going to wrote him. • inspired with (by) < correction="inspired by">The artist was inspired withthe beauty of the mountains. • beware PREPOSITION < correction="Beware of">Beware aboutmalware. • objective case after with(out)/at/to/... < correction="to me|toher|tohim|tous|to them">Give it to I.
Rule examples (name and outcome) Redundant phrases • absolutely essential/necessary (essential/necessary)< correction="essential">This is absolutely essential. • established fact (fact)< correction="a fact">This is an established fact. • there are also other (also)< correction="there are other|there are also">However, there are also othermarbles in the jar. Punctuations • extraneous apostrophes before ‘are’< correction="cars">The car'sare cheap. • Comma after a month< correction="October 1958">The store closed its doors for good in October, 1958. • Missing comma between day of month and year< correction="October 18,">My birthday is October 181983.
Students’ scripts 3 • Fail to check: • Misusing of prepositions: for (1st line) • Missing prepositions: to (4th line) • Incorrect word: force (4th line) • Able to check: • Misuse of ‘much’ and ‘many’ (7th line)
Syntax/ Discourse Examples by teachers Semantics (using of wrong word) Example for the neural network at a later part Unable to check: Since… therefore, although …but
LanguageTool • Able to check: • Spelling • 1st/2nd/3rd person singular • Adverb + noun (eg. simply question) • Some common phrases: concerned about, regarding to • Example limitations on the current rules: • Unable to tackle long and complex phrases (eg. why these video can became) • False alarm: (eg. unseen named entities) • Limited in resolving confusing words (eg. Casual, causal) • Prepositions (eg. for his talk) • Other not implemented grammar rules (eg. Although… but,) • Uncountable nouns
LanguageTool • To improve: • Add and modify the current grammar rules to the LanguageTool • Hybrid with deep learning for complementation
Rules in grammar.xml • Steps: • Split a sentence into a sequence of tokens • Check if it matches the token pattern of an xml rule • Return a message if the token pattern matches • Example: Third person singular with “I” • Input: I goes to to school by bus. • Xml rule: Agreement error - Third person verb with I • Token 1: I • Token 2: VBZ (Verb, 3rd ps. sing. present: eats, jumps, believes, is, has)] • Return: The pronoun ‘I’ must be used with a non-third-person form of a verb: go LanguageTool contains a POS dictionary
The rule pattern in xml • However, in real situation, there are many exceptions have to be added • Examples: • Extra adverb token: Irecentlygoes to… (fail to include) • ‘I’ as a number: Phase I corresponds to…(fail to exclude) • ‘I’ as a letter: I is the ninth letter of alphabet. (fail to exclude) • These can be done using attribute “exception” and “skip” for <token>
skip=“1”: allow an optional arbitrary word follows the token. Includes: Irecentlygoes to… • The actual rule pattern in grammar.xml • postag, exception, skip, and scope are common conditions used in grammar.xml • Current limitations: fail in excluding ‘Paper I’, ‘article I’, ‘I also recently goes to …’, etc. <exception> with scope =“previous”: filter cases with word “phase” before “I”. Excludes:Phase I corresponds to… <exception> at the second token. Excludes: I is theletter…
Another example: Third person singular with “you” Rule pattern Token 1: you Token 2: VBZ Includes: You goes to school. You is a boy. Except the previous token of you is ‘IN’ (Preposition/subordinate conjunction: except, inside, across, on, through, beyond, with, without,…) Excludes: One of you goes to school. The man nearest you is awake. Except with negate = double negation Require the previous token of the verb is RB/PRP/DT (Adverb, negation, Personal pronoun, determiner) : Excludes: Do I have to tell you he isn'there? Anti-rule pattern: Excludes: What I have told you is true.
Making a custom rule • Problem: Math lessons used English. • Generalize: (Noun/adjective) + lesson + use + (Noun/adjective)
Making a custom rule • Outcome
Neural network rule • One of the few Java rules • Will be a new feature in the coming release of LanguageTool • Resolve confusing words using neural network • Eg. causal, causal; • Context: “well as causal/casual wears .” 64x1 well as wears . 256x1 2x1 y=softmax(Wx+b) concatenate causal casual W: weight matrix Will be updated during training y x
Neural network rule – training and validation • Resolving [causal, casual] • Corpus from Wikipedia articles ~3GB • Number of training sets: [979,2765] • Validation sets: [106, 310] • Results: • correct: [48, 243] • incorrect: [14, 11] • Accuracy: [77%, 96%] • unclassified: [44, 56] (min abs score > 0.5) Training samples: is a causal association because or the causal plane or . The causal plane is , the causal plane is friendly , casual script after well as casual wears . popular among casual players . and a causal agent of conclusions about causal links , may miss causal relationships . and no causal connection has
END • Useful links of Languagetool • Online demo:https://languagetool.org/ • Xml syntax overview: http://wiki.languagetool.org/development-overview • Online xml rule editor:https://community.languagetool.org/ruleEditor2/ • neural network rule:https://github.com/gulp21/languagetool-neural-network • Tagset:https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/tagset.txt • Thank you