1 / 12

LanguageTool - Part A

LanguageTool - Part A. 23-10-2017 David Ling. LanguageTool. LanguageTool -- Open source Java program Language_check -- python wrapper of LanguageTool , supports only up to v3.5 (currently v3.9) To use, you can double click ‘languagetool.jar’, or

meliora
Download Presentation

LanguageTool - Part A

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LanguageTool - Part A 23-10-2017 David Ling

  2. LanguageTool • LanguageTool -- Open source Java program • Language_check -- python wrapper of LanguageTool, supports only up to v3.5 (currently v3.9) • To use, you can • double click ‘languagetool.jar’, or • Run as a local host http server via cmd • Main papers • Daniel Naber, A Rule-Based Style and Grammar Checker, Diploma Thesis, University of Bielefeld, 2003 • Marcin Miłkowski, Developing an open-source, rule-based proofreading tool, Software – Practice and Experience 2010, 40 (7), pp. 543-566. DOI: 10.1002/spe.971

  3. Rules in LanguageTool • Xml rules • grammar.xml (collaborative) • Java rules • Rules cannot be handled by xml rules (eg. missing of closing parenthesis, a space after comma) • Spell checking • n-gram frequency for potential homophones (like there - their) • There are only a few Java rules (according to Marcin’s paper in 2010) • xml rules use the following input features: • word token • part of speech of the token – postag(from dictionary) • chunk tag of the (by opennlp)

  4. Xml rules Total: 1704

  5. Xml rules – possible typo Notes: MD: modal words JJ.? : adjective VBN: verb, past participle DT: determiner: an, an, all, … • rule name = "'as follow' (as follows) " • as • follow • [\.:,—\-–]  suggests “as follows” • rule name = "'by' + passive participle (be) " • postag = "MD " • by • postag = "JJ.?|VBN“, except postag = "DT" suggests “be” Example: This can by consistent with…  This can beconsistent with Example: It can by found.It can be found.

  6. Notes: VB[DNPZ]?“: verb infected: use, uses, used, … Xml rules – possible typo • rule name="miss use (misuse) “ • miss • understand|spell|use|place|lead|…|dial, inflected, postag="VB[DNPZ]?“  suggests “mis”+token Example: These words are miss used.  These words are misused. • Other randomly selected rules: • land lover (landlubber) <correction="landlubber">The sailors considered John to be a serious land lover. • I/you/... thing (think) <correction="think|thinks">I thing that's a good idea. • to get ride (rid) of <correction="rid"> Let's get ride of that broken chair.

  7. Notes:WP: wh-pronoun: that, whatever, what,… WRB: wh-adverb: however, how,… VB.*: verb MD: modal words infected: be, is, am, are Xml rules - Grammar • Rule name = "will follows be ('he is would') " • postag = " W(RB|P) " • be, infected • will|must, infected message: redundant Example: How is would this approach be useful?How is this … or How would this… • Rule name="missing verb after 'if there'“ • if, <exception scope="previous">as</exception> • there • <exception postag="VB.*|MD" /> <exception>[´`'’]</exception> message: missing verb Example: If there one who has …  If there is one who has …

  8. Randomly selected xml rules in Grammar • some faculty... (some faculty members...) < correction="faculty members">Three facultysupport the change. • all/most/some (of) + noun < correction="All students|All of the students">All of studentslike mathematics. • both... as well as (and) < correction="and">He is both very rich as well ashandsome. • Use of past form with 'going to ...' < correction="write">I'm going to wrote him. • Who + verb (who know's/knows) < correction="Who cares">Who care's? • inspired with (by) < correction="inspired by">The artist was inspired withthe beauty of the mountains. • beware PREPOSITION < correction="Beware of">Beware aboutmalware. • objective case after with(out)/at/to/... < correction="to me|toher|tohim|tous|to them">Give it to I.

  9. xml rules – commonly confused words • rule name ="and than (then) " • and|since • than suggest: then • rule name="rather/other/different then (than) " • rather|other • then suggest: than • Other rule names: • turned of (off) • 'economical (economic) growth' etc. • in the passed (in the past) • too go (to go)

  10. xml rules – redundant phrases & punctuations Redundant phrases • absolutely essential/necessary (essential/necessary)< correction="essential">This is absolutely essential. • established fact (fact)< correction="a fact">This is an established fact. • there are also other (also)< correction="there are other|there are also">However, there are also othermarbles in the jar. Punctuations • extraneous apostrophes before ‘are’< correction="cars">The car'sare cheap. • Comma after a month< correction="October 1958">The store closed its doors for good in October, 1958. • Missing comma between day of month and year< correction="October 18,">My birthday is October 181983.

  11. N-gram data rule • Resolve confusing words pair, like their and there • Given a confusion list (currently ~600 pairs): eg. (their, there; adapting, adopting) • Input sentence: This is there last chance to escape. • System will consider 3-gram frequency of ‘there’ with ‘their’: This is there, is there last, there last chance This is their, is their last, their last chance • Recommend using their if the probability ratio is greater thana ratio Remarks: n-gram data is from google book ngramviewer • Someone is developing word2vec to calculate the probability instead of the 3-gram (context: {this, is, last, chance}, guessing {there, their})

  12. Next time • other xml rules • spell check • chunking by opennlp • references: • http://wiki.languagetool.org • https://community.languagetool.org/rule/list

More Related