1 / 17

Tips and Tricks … with INTEX/NOOJ

Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu. Tips and Tricks … with INTEX/NOOJ. Max Silberztein University of Franche-Comte max.silberztein@univ-fcomte.fr. Outline. Why INTEX/NOOJ should be a tool of choice? raising language awareness

gail
Download Presentation

Tips and Tricks … with INTEX/NOOJ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu Tips and Tricks … with INTEX/NOOJ • Max Silberztein • University of Franche-Comte • max.silberztein@univ-fcomte.fr

  2. Outline • Why INTEX/NOOJ should be a tool of choice? • raising language awareness • studying linguistics • lexical analysis • morphology • paradigms • word formation • automatic lexical acquisition • syntax • local grammars • semantic tagging

  3. List of useful features • instant lexical lookup • linguistically sophisticated lexicon • intuitive graphical interface • fast, robust, finite-state technology • corpus, lecxicon, grammar handled uniformly • instant confirmation from corpus • can be used at different levels of competence • simple corpus query tool • grammar development environment • research tool for NLP projects

  4. Morphology I - Inflection paradigms handled in the form of fst’s

  5. Morphology I - Inflection stem variants processed with operations on strings L = move left erasing character

  6. Morphology II derivation • All the formsderived fromthe root ‘fran-’ • Ideal to learnand experimentwith morphologicalsegmentation

  7. Automatic lexical extraction Store any sequence of letters, which is followed by –ize or –ify in variable $Root Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V

  8. Lexical constraints check if the string stored in $Root is in the lexicon as an A, with feature +Nation Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V

  9. Syntax • grammars defined in graphs relying on info stored in the lexicon (minimally lemma and POS)

  10. Instant feedback from corpus

  11. Labelled bracketing • hit strings may be tagged (merge mode) • [NP a soft, slow step NP] • or replaced with bracketing • [NP NP]

  12. Disambiguation • Very – Adjective or Adverbs

  13. Recursion – embedded graphs

  14. An exercise in semantic tagging • Expressions of time

  15. An exercise in semantic tagging • Expressions of time

  16. Finally, not for the faint hearted … • the big picture

  17. Conclusions • Teaching linguistic analysis by doing it • INTEX/NooJ is [det THE] technology to use honestly…  All welcome to have a go at it Thank you for your attention!

More Related