200 likes | 320 Views
Interaction Grammars and their implementation in LEOPAR. Guy Perrier University Nancy2 - LORIA (Nancy). 1- Why a new linguistic formalism ?. Some crucial points in the design of a linguistic formalism : The form of the basic bricks, The composition rules, The syntax-semantics interface.
E N D
Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)
1- Why a new linguistic formalism ? • Some crucial points in the design of a linguistic formalism : • The form of the basic bricks, • The composition rules, • The syntax-semantics interface. • Among the usual formalisms, none prevails on all others.
1- Why a new linguistic formalism ? • The originality of Interaction Grammars (CoLing 2000): • For the syntax, the basic bricks are underspecified trees represented in the form of tree descriptions (this aspect comes from formalisms stemming from TAG); • The composition of underspecified trees to build completely specified trees is performed by superposition under the control of apolarity system. Polarity neutralization expresses the saturation of syntactic structures (this aspect comes from Categorial Grammars) .
2- The importance of an experimental approach • The relevance of a linguistic formalism can only be proved in a confrontation with real corpora. • The development of the LEOPAR parser answers this ambition. • The change of scale requires two conditions : • parsing algorithms that are efficient enough to overcome the explosion of ambiguity which follows; • lexicons and grammars with large coverage .
3 - The formalism of Interaction Grammars • The basic syntactic objects are tree descriptions : a tree description is a set of relations and properties on tree nodes representing syntactic constituents. • Relations are (immediate and large) dominance relations or (immediate and large) precedence relations. • Nodes are labelled with feature structures describing properties of syntactic constituents. Feature values are atoms or atom disjunctions and they can be shared by several features.
3 - The formalism of Interaction Grammars • Features are polarized : • negative features(f v) represent expected resources; • positive features(f v) represent available resources; • neutral features(f = v) represent properties which do not behave as consumable resources.
3 - The formalism of Interaction Grammars • A syntactic description represents an underspecified syntactic tree. In other words, it represents a family of syntactic trees which are the models of the description. • Among all models of a description, only neutral and minimal models are linguistically relevant: • A neutral model realizes the neutralisation of every negative feature with a positive feature and conversely. • A minimal model adds a minimum of information to the description.
3 - The formalism of Interaction Grammars • The construction of neutral and minimal models for a description is performed by iterating the operation of feature neutralisation: this operation consists in merging two nodes labelled with two dual features (f v and f v). • The neutralisation of two features entails a partial tree superposition by propagating constraints defining the description .
3 - Modelling of syntactic phenomena in French • Barriers to extraction • L’invitation que Jean demandeà Marie • L’invitation que Jean pense demander à Marie • * L’invitation que Marie connaît Jean qui demande • Pied piping • A la femme de qui Jean demande-t-il une invitation ? • A la femme de qui Jean pense-t-il demander une invitation ? • Negation (ne … personne, ne… aucun) • Personnene demande une invitation à Marie. • Jean ne demande aucune invitation à Marie. • Jean ne demande une invitation à personne. • Jean ne demande une invitation à la femme d’aucun ingénieur .
4 - Principle of the LEOPAR parser • LEOPAR is developed inside the Calligramme team by Guillaume Bonfante, Bruno Guillaume, Sylvain Pogodalla and Guy Perrier. • This work started in 2003. After a first release of the parser, a second release is now available. It includes 17000 lines of OCAML code. • The parser is freely downloadable under Cecill licence at URL :http://www.loria.fr/equipes/calligramme/leopar/download.html .
tokenization Jean a demandé une invitation à Marie lexical selection N0VN1 N0VN1aN2 CommonN InfCompl ProperNoun StandardDet NaN1deN2 VerbPrep ProperNoun Avoir N0VS1aN2 . . . . . . . . . . . . x 1 x 8 x 20 x 1 4 x 4 x 1 = 2560 4 - Principle of the LEOPAR parser Parsing of the sentence : Jean a demandé une invitation à Marie
N0VN1 N0VN1aN2 CommonN InfCompl ProperNoun StandardDet NaN1deN2 VerbPrep ProperNoun Avoir N0VS1aN2 . . . . . . . . . . . . x 1 x 8 x 20 x 1 4 x 4 x 1 = 2560 Input filtering Avoir ProperNoun N0VN1aN2 StandardDet CommonN VerbPrep ProperNoun Avoir = 3 Avoir 4 - Principle of the LEOPAR parser
Avoir ProperNoun N0VN1aN2 StandardDet CommonN VerbPrep ProperNoun Avoir = 3 Avoir Parsing S V NP PP Aux V NP Det N Prep NP Jean a demandé une invitation à Marie 4 - Principle of the LEOPAR parser
5 - Input filtering • Principle: for every input choice, there is a parse only if the polarity balance is null for every feature and for every feature value. • This is a global input filtering criterion. • For every feature value, we build an automaton which counts polarities. A path in the automaton represents an input choice and we keep it only if the polarity balance is null along this path for the considered feature value. • Because feature values can take the form of disjunctions, the automaton can be nondeterministic. It is determinised by computing possible polarity intervals instead of precise values. • Filtering can be improved in different ways : bounding polarity intervals, using specific properties of coordination, adding probabilities.
6 - Parsing • The principle is to build a neutral and minimal model of the syntactic description corresponding to every path in the automaton. • The current strategy implemented in LEOPAR is a left-to-right strategy. In order to reduce the search space, a bound is put on the number of active polarities allowed during the parsing process. • The automaton is visited from left to right. If the number of active polarities in the current description is under the bound, we take a shift step in the automaton, increasing the current description. Otherwise, we take a reduce step : we reduce the number of active polarities under the bound by performing neutralisations.
6 - Parsing • The strategy has two drawbacks: because of the bound on the number of active polarities, it is not complete and, in order to avoid to produce the same solution several times, the sequence of neutralisations must respect a fixed order. • The parsing efficiency can be improved by using a top-down strategy. • Robustness can be taken into account by using a bottom-up strategy.
7 - Lexical and grammatical resources with large coverage • The construction and the maintenance of large lexicons and grammars require to conciliate the size of such resources with linguistic (readability) and computing (efficiency) constraints. • These resources should be reusable as much as possible for other formalisms. • All the resources which we produce are freely available.
8 - A lexicon independent of the formalism • The lexicon used by the parser is not built directly but it results from the combination of a morpho-syntactic lexicon independent of the formalism with a grammar written in the formalism of Interaction Grammars. • The morpho-syntactic lexicon results from the combination of a morphological lexicon with a syntactic lexicon. • We have built a syntactic lexicon with 400 entries in order to test LEOPAR on the French sentences of the TSNLP (Test Suite for Natural Language Processing). • In a joint work with Claire Gardent, Bruno Guillaume and Ingrid Falk, we have designed a method to extract a lexicon from the LADL tables. With this method, we have produced a lexicon from 11 tables and 2000 verbs.
9 - A two-level grammar : source and object • The principle is to consider two levels for the grammar : • A sourcegrammar is written by a human in a high level language well suited to the expression of linguistic regularities. • The source grammar is compiled into an objectgrammar which is directly usable in a NLP system. • Denys Duchier, Joseph Le Roux, Yannick Parmentier and Benoit Crabbé (LORIA) have developed a grammatical description language associated with a compiler. The system is called XMG (eXtendible MetaGrammar). • We used XMG to produce a French interaction grammar (740 descriptions).
10 - Prospects • To develop more efficient parsing strategies which integrate robustness. • To integrate semantics. • To extend the coverage of the French grammar. • To improve the efficiency of the parser by using statistics.