170 likes | 343 Views
Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu. Tips and Tricks … with INTEX/NOOJ. Max Silberztein University of Franche-Comte max.silberztein@univ-fcomte.fr. Outline. Why INTEX/NOOJ should be a tool of choice? raising language awareness
E N D
Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu Tips and Tricks … with INTEX/NOOJ • Max Silberztein • University of Franche-Comte • max.silberztein@univ-fcomte.fr
Outline • Why INTEX/NOOJ should be a tool of choice? • raising language awareness • studying linguistics • lexical analysis • morphology • paradigms • word formation • automatic lexical acquisition • syntax • local grammars • semantic tagging
List of useful features • instant lexical lookup • linguistically sophisticated lexicon • intuitive graphical interface • fast, robust, finite-state technology • corpus, lecxicon, grammar handled uniformly • instant confirmation from corpus • can be used at different levels of competence • simple corpus query tool • grammar development environment • research tool for NLP projects
Morphology I - Inflection paradigms handled in the form of fst’s
Morphology I - Inflection stem variants processed with operations on strings L = move left erasing character
Morphology II derivation • All the formsderived fromthe root ‘fran-’ • Ideal to learnand experimentwith morphologicalsegmentation
Automatic lexical extraction Store any sequence of letters, which is followed by –ize or –ify in variable $Root Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V
Lexical constraints check if the string stored in $Root is in the lexicon as an A, with feature +Nation Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V
Syntax • grammars defined in graphs relying on info stored in the lexicon (minimally lemma and POS)
Labelled bracketing • hit strings may be tagged (merge mode) • [NP a soft, slow step NP] • or replaced with bracketing • [NP NP]
Disambiguation • Very – Adjective or Adverbs
An exercise in semantic tagging • Expressions of time
An exercise in semantic tagging • Expressions of time
Finally, not for the faint hearted … • the big picture
Conclusions • Teaching linguistic analysis by doing it • INTEX/NooJ is [det THE] technology to use honestly… All welcome to have a go at it Thank you for your attention!