1 / 23

LanguageTool - Open Source Grammar Checking Program

LanguageTool is a highly customizable, rule-based Java program for grammar checking. It contains xml and java rules, as well as a POS dictionary. It can be used online or locally.

darosa
Download Presentation

LanguageTool - Open Source Grammar Checking Program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LanguageTool 3 07-11-2017 David Ling

  2. Contents • LanguageTool • Overview on rules • Web demonstration • Performances on students’ scripts • Rule xml syntax and customization • Token, exception, POSTag, skip • Example: Third person singular • Making a custom example: Math lessons use English • Java rules: neural network • Resolving a custom confusion pairs: causal, casual

  3. LanguageTool • LanguageTool • Open source grammar checking Java program • Rule-based, highly customizable • Input features for the rules: POSTag, word pattern, Chunking-tag • To use your own LanguageTool, you can • double click ‘languagetool.jar’ • via windows command line prompt cmd • Run a local http server, connect via browsers • Online demo available: https://languagetool.org/

  4. LanguageTool rules LanguageTool contains a POS dictionary Modal verb Noun Verb (base form) Verb (3rd person singular) Adjective • Two main kinds of rules • Xml rules • Java rules • Xml rules are customizable. Two corresponding files: • Disambiguation.xml  for reducing multiple POSTags of a token, 346 rules • Grammar.xml  for grammar rules, ~1700 rules

  5. Grammar rules in grammar.xml Total: 1704

  6. Rule examples (name and outcome) Grammar • all/most/some (of) + noun < correction="All students|All of the students">All of studentslike mathematics. • both... as well as (and) < correction="and">He is both very rich as well ashandsome. • Use of past form with 'going to ...' < correction="write">I'm going to wrote him. • inspired with (by) < correction="inspired by">The artist was inspired withthe beauty of the mountains. • beware PREPOSITION < correction="Beware of">Beware aboutmalware. • objective case after with(out)/at/to/... < correction="to me|toher|tohim|tous|to them">Give it to I.

  7. Rule examples (name and outcome) Redundant phrases • absolutely essential/necessary (essential/necessary)< correction="essential">This is absolutely essential. • established fact (fact)< correction="a fact">This is an established fact. • there are also other (also)< correction="there are other|there are also">However, there are also othermarbles in the jar. Punctuations • extraneous apostrophes before ‘are’< correction="cars">The car'sare cheap. • Comma after a month< correction="October 1958">The store closed its doors for good in October, 1958. • Missing comma between day of month and year< correction="October 18,">My birthday is October 181983.

  8. Students’ scripts 1

  9. Students’ scripts 2

  10. Students’ scripts 3 • Fail to check: • Misusing of prepositions: for (1st line) • Missing prepositions: to (4th line) • Incorrect word: force (4th line) • Able to check: • Misuse of ‘much’ and ‘many’ (7th line)

  11. Syntax/ Discourse Examples by teachers Semantics (using of wrong word) Example for the neural network at a later part Unable to check: Since… therefore, although …but

  12. LanguageTool • Able to check: • Spelling • 1st/2nd/3rd person singular • Adverb + noun (eg. simply question) • Some common phrases: concerned about, regarding to • Example limitations on the current rules: • Unable to tackle long and complex phrases (eg. why these video can became) • False alarm: (eg. unseen named entities) • Limited in resolving confusing words (eg. Casual, causal) • Prepositions (eg. for his talk) • Other not implemented grammar rules (eg. Although… but,) • Uncountable nouns

  13. LanguageTool • To improve: • Add and modify the current grammar rules to the LanguageTool • Hybrid with deep learning for complementation

  14. Rules in grammar.xml • Steps: • Split a sentence into a sequence of tokens • Check if it matches the token pattern of an xml rule • Return a message if the token pattern matches • Example: Third person singular with “I” • Input: I goes to to school by bus. • Xml rule: Agreement error - Third person verb with I • Token 1: I • Token 2: VBZ (Verb, 3rd ps. sing. present: eats, jumps, believes, is, has)] • Return: The pronoun ‘I’ must be used with a non-third-person form of a verb: go LanguageTool contains a POS dictionary

  15. The rule pattern in xml • However, in real situation, there are many exceptions have to be added • Examples: • Extra adverb token: Irecentlygoes to… (fail to include) • ‘I’ as a number: Phase I corresponds to…(fail to exclude) • ‘I’ as a letter: I is the ninth letter of alphabet. (fail to exclude) • These can be done using attribute “exception” and “skip” for <token>

  16. skip=“1”: allow an optional arbitrary word follows the token. Includes: Irecentlygoes to… • The actual rule pattern in grammar.xml • postag, exception, skip, and scope are common conditions used in grammar.xml • Current limitations: fail in excluding ‘Paper I’, ‘article I’, ‘I also recently goes to …’, etc. <exception> with scope =“previous”: filter cases with word “phase” before “I”. Excludes:Phase I corresponds to… <exception> at the second token. Excludes: I is theletter…

  17. Another example: Third person singular with “you” Rule pattern Token 1: you Token 2: VBZ Includes: You goes to school. You is a boy. Except the previous token of you is ‘IN’ (Preposition/subordinate conjunction: except, inside, across, on, through, beyond, with, without,…) Excludes: One of you goes to school. The man nearest you is awake. Except with negate = double negation Require the previous token of the verb is RB/PRP/DT (Adverb, negation, Personal pronoun, determiner) : Excludes: Do I have to tell you he isn'there? Anti-rule pattern: Excludes: What I have told you is true.

  18. Making a custom rule • Problem: Math lessons used English. • Generalize: (Noun/adjective) + lesson + use + (Noun/adjective)

  19. Making a custom rule • Outcome

  20. Neural network rule • One of the few Java rules • Will be a new feature in the coming release of LanguageTool • Resolve confusing words using neural network • Eg. causal, causal; • Context: “well as causal/casual wears .” 64x1 well as wears . 256x1 2x1 y=softmax(Wx+b) concatenate causal casual W: weight matrix Will be updated during training y x

  21. Neural network rule – training and validation • Resolving [causal, casual] • Corpus from Wikipedia articles ~3GB • Number of training sets: [979,2765] • Validation sets: [106, 310] • Results: • correct: [48, 243] • incorrect: [14, 11] • Accuracy: [77%, 96%] • unclassified: [44, 56] (min abs score > 0.5) Training samples: is a causal association because or the causal plane or . The causal plane is , the causal plane is friendly , casual script after well as casual wears . popular among casual players . and a causal agent of conclusions about causal links , may miss causal relationships . and no causal connection has

  22. Neural network rule – working in languagetool

  23. END • Useful links of Languagetool • Online demo:https://languagetool.org/ • Xml syntax overview: http://wiki.languagetool.org/development-overview • Online xml rule editor:https://community.languagetool.org/ruleEditor2/ • neural network rule:https://github.com/gulp21/languagetool-neural-network • Tagset:https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/tagset.txt • Thank you

More Related