370 likes | 531 Views
Developing a German grammar for analysis and generation using OpenCCG. Ciprian Gerstenberger University of Saarland IGK Colloquium January 13th 2005. Outline. NLP environments: a comparison The choice: OpenCCG The formalism: MMCCG The German grammar Future work.
E N D
Developing a German grammar for analysis and generation using OpenCCG Ciprian Gerstenberger University of Saarland IGK Colloquium January 13th 2005
Outline • NLP environments: a comparison • The choice: OpenCCG • The formalism: MMCCG • The German grammar • Future work
Dialogue systems • building dialogue systems → linguistic resources • linguistic resources → tools for developing and maintaining • wide range of different NLP environments ⇒Which is the most appropriate environment for our purposes?
NLP environments for dialogue systems General requirements • both for analysis and generation • multi-lingual • easy domain reconfigurability Requirements for NLG • realization of contextually sensitive utterances • linguistically motivated control over flexible sentence realization
NLP environments for dialogue systems Technical requirements • freely available • well documented • offering support when needed • freely available resources (for German) • efficient • platform independent
NLP environments • KPML (Lisp): Systemic-Functional Grammar (SFG) • OpenCCG (Java): Multi-Modal Combinatory Categorial Grammar (MMCCG) • Babel (Prolog): Head-Driven Phrase Structure Grammar (HPSG) • LKB (Lisp): Head-Driven Phrase Structure Grammar (HPSG) • XLE (C): Lexical Functional Grammar (LFG) • XTAG (Lisp): Tree Adjoning Grammar (TAG) • XDG (Oz): Topological Dependency Grammar (TDG)
NLP environments: Babel Babel-System (S. Müller) • implementing HPSG • Prolog • only analysis, no generation • multi-lingual (?) • resources for German: grammar with good coverage • freely available • documentation • support (?)
NLP environments: LKB LKB • implementing HPSG • Lisp • multi-lingual • both analysis and generation • but: resources for German not usable for generation • freely available • documentation • support (?)
NLP environments: XTAG XTAG • implementing TAG • Lisp • both analysis and generation • multi-lingual • resources for German (DFKI ?) • freely available • documentation • support (?)
NLP environments: XDG XDG • implementing TDG • Oz • only analysis (generation as dependency parsing using TAGs) • multi-lingual (?) • resources for German (toy grammars) • freely available • documentation (?) • support
NLP environments: KPML KOMET-Penman Multilingual Linguistic resource development • implementing Systemic-Functional Grammar (SFG) • Lisp • multi-lingual • flexible generation • good sentence realization control • only for generation, no parsing • resources for German: grammar with good coverage • freely available • documentation and support
NLP environments: XLE Xerox Linguistic Environment • implementing LFG • C and Tcl/Tk • multi-lingual • both analysis and generation • resources for German (not freely available) • documentation • support • not freely available
NLP environments: OpenCCG OpenCCG • implementing Multi-Modal Combinatory Categorial Grammar (MMCCG) • open source Java-based NLP library • both analysis and generation • multi-lingual • no resources for German, but grammars for English • freely available • documentation • support
NLP environments: The Choice OpenCCG • Java-based NLP library → platform independent • analysis and generation → uniform grammar resources • multi-lingual → extendable • used and in use in several other projects: FLIGHTS, COMIC, COSY • supporting output format for TTS (e.g. APML) • optimized sentence realization • flexible generation • sentence realization control
Basic formalism: CCG Combinatory Categorial Grammar • lexicalized grammar formalism • lexical items are assigned syntactic categories • combinatory rules
MMCCG Multi-Modal Combinatory Categorial Grammar • refining CCG by introducing means of controlling the application of combinatory rules • specifying modes on category forming operators (slashes) • making application of rules dependent on the slash mode • four basic modes governing different levels of associativity and permutativity
Example Der Hund sieht die Katze.
Example (cont.) Der Hund sieht die Katze.
Developing a German Grammar • joint work with Magdalena Wolska (DIALOG Project) Desiderata • uniform resources for analysis and generation • covering all phenomena in our domains • achieve more generality of the grammar than wrt phenomena encountered in our (relatively small) corpora
Phenomena Some phenomena in German • agreement • position of the finite verb • Topological Fields: controlling the Vorfeld • complex sentences • ambiguity • controlling sentence realization
Clause types Verb-initial clauses • yes/no questions: Soll ich die den Titel zu der Liste hinzufügen? • alternative questions: Möchtest Du Mozart oder Bach hören? • imperatives: Wähle das Album „Californication“ von den Red Hot Chili Peppers!
Clause types (cont.) Verb-second clauses • main declarative: Der Titel wurde hinzugefügt. • wh-question: Welcher Künstler spielt „Missunderstood“?
Clause types (cont.) Verb-final clauses • subordinate clause: Wenn Sie möchten, kann ich „We Just Can´t Get Enough CCG“ abspielen. • relative clause: Ich nehme aus den ersten vier Alben, die du hast, jeweils den ersten Song. • complement clause: Ich glaube, daß das Album „Dangerously In Love“ heißt.
Topological Fields Controlling the Vorfeld occupation using flags
Topological Fields (cont.) Controlling the Vorfeld occupation using flags
Analysis: Ambiguities Der Hund von dem traurigen Mann den ich sah rennt.
Analysis: Ambiguities (cont.) Das Kind rennt wenn der Hund rennt weil die Katze rennt.
Generation Sentence realization without control
Generation (cont.) Sentence realization with control: fronted subject
Generation (cont.) Sentence realization with control: fronted object
Future Work (1) • extending the grammar wrt the two domain currently modelled (MP3 and maths tutorial) • (AP, NP, sentence, etc.) coordination • complex NP (e.g. postmodifications) • control and raising verbs • particle verbs (Ich spiele den Song ab vs. Ich möchte den Song abspielen) • Topological Fields: scrambling in the Mittelfeld
Future Work (2) • analysis: coping with partial input, ill-formed utterances • generation: realizing elliptical output • using a dynamic morphological module • development of an ontology