580 likes | 1.02k Views
LEXICALIZATION. AND. CATEGORIAL GRAMMARS:. A STORY BAR-HILLEL MIGHT HAVE LIKED. ARAVIND K. JOSHI. UNIVERSITY OF PENNSYLVANIA. PHILADELPHIA, PA 19104. USA. June 1995. Outline. Introduction Lexicalization Weak Lexicalization and Strong Lexicalization
E N D
LEXICALIZATION AND CATEGORIAL GRAMMARS: A STORY BAR-HILLEL MIGHT HAVE LIKED ARAVIND K. JOSHI UNIVERSITY OF PENNSYLVANIA PHILADELPHIA, PA 19104 USA June 1995
Outline • Introduction • Lexicalization • Weak Lexicalization and Strong Lexicalization • Strong lexicalization and Lexicalized Tree-Adjoining Grammars (LTAGs) • Strong Lexicalization and Categorial Grammars (CG) • Basic partial proof trees • Inference rules from proof trees to proof trees • Formal characterization of the inference rules • Relevance to parsing • Summary
Introduction • Equivalence of categorial grammars and context-free grammars (Bar-Hillel, Gaifman and Shamir 1960) • Fate of grammars in the 60’s that were shown to be equivalent or conjectured to be equivalent to CFGs • Non-transformational or minimally transformational grammars of the 70’s, 80’s and 90’s - GPSG, LFG, HPSG, various types of CGs, LTAG and others - GB, Minimalist Theory - LTAGs are, in a sense, transformational, reminiscent of ‘generalized transformations’ in the earliest formulation of transformational grammars
Introduction • Bar-Hillel et al. 1960 suggested that CFGs (and by implication grammars equivalent to CFGs) can be used for the so-called ‘kernel’ sentences of Chomsky • Categorial Grammars with partial proof trees CG (PPT), the system presented here, can be thought of as related to this suggestion of Bar-Hillel et al. 1960 • This relationship and Bar-Hillel’s strong interest in comparative studies of formal grammars are the basis for the second half of the title -- A story Bar-Hillel might have liked
Related work • Proof trees, Morrill et al. 1990 • Description trees, Vijayshanker 1993 • HPSG compilation into LTAG trees, Kasper et al. 1992/1995
Lexicalization • A grammar G is a lexicalized grammar if it consists of • a finite set of structures (strings, trees, dags, for example), each structure being associated with a lexical item, called its anchor • a finite set of operations for composing these structures • A grammar G strongly lexicalizes another grammar G’ if G is a lexicalized grammar and the structural descriptions (trees, for example) of G and G’ are exactly the same
Lexicalized grammars • Context-free grammar (CFG) CFG, G S ® NP VP NP ® Harry VP ® V NP VP ® VP ADV NP ® peanuts V ® likes ADV ® passionately (Non-lexical) S (Lexical) NP VP Harry VP ADV passionately NP V likes peanuts
CFGs can weakly lexicalize CFGs but not strongly • Greibach Normal Form (GNF) CFG rules are of the form A ® a B1 B2 ... Bn A ® a This lexicalization gives the same set of strings but not the same set of trees, i.e., the same set of structural descriptions. Hence, it is a weak lexicalization. Converting a CFG to a categorial grammar (CG) gives only weak lexicalization and not necessarily a strong lexicalization. Ajdukiewicz and Bar-Hillel Categorial Grammars CG(AB) weakly lexicalize CFGs but not strongly.
Strong lexicalization of CFGs • Same set of strings and same set of trees or structural descriptions. • Tree substitution grammars • Increased domain of locality • Substitution as the combining operation
CG(AB) cannot strongly lexicalize CFGs CFG, G: CG(AB), G’ S ® SS a: S a: S/S S ® a • G’ weakly lexicalizes G but not strongly. • Not all trees of G are proof trees of G’ (assuming appropriate relabeling of nodes ). Note: Adding function composition helps in this example but, in general, it will not help.
Strong lexicalization -- Tree substitution grammars CFG, G S ® NP VP NP ®Harry VP ® V NP NP ® peanuts V ® likes S a3 NP a2 NP TSG, G’ a1 Harry peanuts NP¯ VP V NP¯ likes
TSGs cannot strongly lexicalize CGFs • Formal insufficiency of TSG G: S ®SS (non-lexical) S ® a (lexical) S S S TSG: G’: a1: a2: a3: S S¯ S¯ S a a a
TSGs cannot lexicalize CFGs S S S TSG: G’: a1: a2: t: S S S¯ S S S¯ S S a a S S a a S a a3: S S a a a G’ can generate all strings of G but not all trees of G. TSGs cannot strongly lexicalize CFGs. Thus substitution alone is not enough.
TSGs are also linguistically inadequate • Linguistic inadequacy of TSG G: S ® NP VP VP ® VP ADV VP ® V NP NP ® Harry/ peanuts V ® likes ADV ® passionately G’: a1: S a2: NP a3: NP a4: VP NP¯ VP Harry peanuts VP¯ ADV passionately V NP¯ likes G’ is inadequate. It cannot achieve recursion on VP.
Linguistic inadequacy of TSGs a2: NP a3: NP a4: VP G’’: a1: S VP¯ Harry peanuts ADV NP¯ VP V NP¯ passionately likes a5: S VP a6: NP¯ VP V NP¯ Even when a CFG can be lexicalized by substitution alone, the lexical anchors may not be linguistically appropriate. VP¯ likes ADV passionately
TSGs with substitution and adjoining -- LTAGs G: S ® SS S ® a S S S TSG: G’: a1: a2: a3: a S S* S* S g: S a a S S Adjoining a2 to a3 at the S node, the root node and then adjoining a1 to the S node of a2 , the left daughter of the root node, we have g. a S S a a LTAGs strongly lexicalize CFGs. Adjoining is crucial for lexicalization.
Adjunction permits appropriate choice of lexical anchors a2: NP a3: NP a4: VP G3: a1: S VP* Harry peanuts ADV NP* VP passionately V NP¯ likes A tree rooted in S and anchored in ‘passionately’ is not needed. Lexical anchors as functors.
Adjoining X g: b: X X* g’: X Summary of lexicalization LTAGs strongly lexicalize CFGs. Adjoining and, therefore, LTAGs arise out of lexicalization of CFGs. X*
Lexicalized Tree-Adjoining Grammars (LTAGs) • Finite set of elementary trees anchored on lexical items • Elementary trees • Initial trees • Auxiliary trees • Operations • Substitution • Adjoining • Derivation • Derivation tree -- How elementary trees are put together. • Derived tree
Properties of LTAGs • Localization of dependencies • Syntactic locality • Agreement • Subcategorization • Filler-gap • Word order • Local scrambling • Long distance scrambling-- movement across clauses • Word clusters (flexible idioms) -- non-compositionality • Function -- argument
Properties of LTAGs • Extended domain of locality (EDL) • Factoring recursion from the domain of dependencies (FRD) • All interesting properties of LTAG follow from EDL and FRD • Mathematical - Computational: mild context-sensitivity, polynomial parsability, semi-linearity, etc. • Linguistic
Strong lexicalization: EDL, FRD Strong Lex-EDL, FRD CFG LTAG Weak equivalence Weak equivalence? CG (AB) CG (PPT) Strong Lex-EDL, FRD • CG (AB), although weakly equivalent to CFG, do not lexicalize CFG. CG (AB) has function application only. • In analogy to LTAG, we work with larger structures, Partial Proof Trees (PPT) and inference rules from proof trees to proof trees. • CG (PPT) has properties (linguistic and mathematical) similar to LTAG. lex-cg-June 95
Strong lexicalization: EDL, FRD likes (NP\S)/NP [NP] [NP] (NP\S) S • Main idea • Each lexical item is associated with one or more (basic) partial proof trees (BPPT) obtained by unfolding arguments. • B(PPT) is the (finite) set of BPPTs -- the set of basic types. • Informal description of the inference rule -- linking
How is B(PPT), finite set of basic partial proof trees, constructed? • Unfold arguments of the type associated with a lexical item in a CG (AB) by introducing assumptions. • No unfolding past an argument which is not an argument of the lexical item. • If a trace assumption is introduced while unfolding then it must be locally discharged, i.e., within the basic PPT which is being constructed. • While unfolding we can interpolate, say, from X to Y where X is a conclusion node and Y is an assumption node.
Unfolding arguments man apples the N NP NP/N [N] likes (NP\S)/NP [NP] NP [NP] (NP\S) S the man likes the apples • Linking conclusion nodes to assumption nodes
How is B(PPT), finite set of basic partial proof trees, constructed? • Unfold arguments of the type associated with a lexical item in a CG (AB) by introducing assumptions. • No unfolding past an argument which is not an argument of the lexical item. • If a trace assumption is introduced while unfolding then it must be locally discharged, i.e., within the basic PPT which is being constructed. • While unfolding we can interpolate, say, from X to Y where X is a conclusion node and Y is an assumption node.
No unfolding past a non-argument passionately [(NP\S)] (NP\S)\ (NP*\S) (NP*\S) The subject NP marked by * is not an argument of ‘passionately’. This a property of the lexical item and thus it can be marked on the type assigned to the lexical item by CG (AB). • No unfolding past an argument marked by *. • Thus unfolded arguments are only those which are the arguments of the lexical item.
Stretching and linking -- First informal inference rule A proof tree can be stretched at any node. u v w X Y A proof tree to be stretched at the node X.
Stretching a proof tree at node X u v w u v w X X [X] Y X is the conclusion from v Y is the conclusion from u [X] w i.e., from u, assumption X and w Linking X to [X] we have the original proof tree. Y
Stretching and linking -- an example likes (NP\S)/NP [NP] [NP] (NP\S) S Stretching at the indicated node
Stretching and linking -- an example likes (NP\S)/NP [NP] [NP] (NP\S) (NP\S] S
Stretching and linking -- an example likes (NP\S)/NP [NP] [NP] passionately (NP\S) [(NP\S)] (NP\S)\ (NP*\S) (NP*\S) (NP\S)] Linking conclusion nodes to assumption nodes and assuming that appropriate proof trees are linked to the two NP assumption nodes, we have S John likes apples passionately
How is B(PPT), finite set of basic partial proof trees, constructed? • Unfold arguments of the type associated with a lexical item in a CG (AB) by introducing assumptions. • No unfolding past an argument which is not an argument of the lexical item. • If a trace assumption is introduced while unfolding then it must be locally discharged, i.e., within the basic PPT which is being constructed. • While unfolding we can interpolate, say, from X to Y where X is a conclusion node and Y is an assumption node.
Introduction and discharge of trace assumption likes e (NP\S)/NP [NP] Trace assumption [NP] [NP] (NP\S) S Local discharge of the trace assumption. The appropriate directionality by convention. (NP\S) S Apples Mary likes
An example using a PPT with trace assumption, stretching and linking apples Mary likes e (NP\S)/NP [NP] NP NP [NP] [NP] John (NP\S) NP thinks S [NP] (NP\S)/S [S] (NP\S) S [S] (NP\S) S Apples John thinks Mary likes
An example of a PPT with trace assumption,stretching and linking John Mary calls e (NP\S)/NP [NP] NP NP [NP] [NP] (NP\S) everyday S (NP\S) [ (NP\S) ] (NP*\S)\(NP\S) (NP*\S) [NP\S] S Note: In a natural deduction type CG, a permutation operator is needed for this John Mary calls everyday the system.
Basic PPT for object relative clause meets e (NP\S)/NP [NP] wh Trace assumption [N] (N\N)/(NP\S) [NP] (NP\S) S Local discharge of the trace assumption. The appropriate directionality by convention. (NP\S) (N\N) N who Bill meets
Object relative clause, stretching and linking meets e (NP\S)/NP [NP] wh [N] (N\N)/(NP\S) [NP] today (NP\S) [(NP\S)] (NP*\S)\(NP\S) [(NP\S)] (NP*\S) S (NP\S) Note: In a natural deduction type CG, a permutation operator is needed for this (N\N) case, which adds power to the system. N who Bill meets today
How is B(PPT), finite set of basic partial proof trees, constructed? • Unfold arguments of the type associated with a lexical item in a CG (AB) by introducing assumptions. • No unfolding past an argument which is not an argument of the lexical item. • If a trace assumption is introduced while unfolding then it must be locally discharged, i.e., within the basic PPT which is being constructed. • While unfolding we can interpolate, say, from X to Y where X is a conclusion node and Y is an assumption node.
An example -- John tries to walk John walk(inf) NP tries [NPpro] (NPpro)\Sinf Sinf [NP] (NP\S)/Sinf [Sinf] (NP\S) S Note: Subject NP is an argument for tries. Hence, unfolding continues past NP in (NP\S). John tries to walk
Raising verbs -- subject NP is not an argument seems (NP*\S)/(NP\Sinf) [(NP\Sinf)] (NP*\S) Subject NP is not an argument of seems. Hence, unfolding does not continue past NP in (NP*\S).
Interpolation in a basic PPT Another basic PPT for walk(inf) walk(inf) Interpolation from [NP] (NP\Sinf) (NP\Sinf) to (NP\S) [(NP\S)] S
Interpolation and linking John NP walk(inf) [NP] (NP\Sinf) seems (NP*\S)/(NP\Sinf) [(NP\Sinf)] [(NP\S)] (NP*\S) S John seems to walk
Interpolation: extraction of an NP under a PP complement gives [NP] (NP\S)/PP/NP [NP] [PP] (NP\S)/PP (NP\S) S
Interpolation: extraction of an NP under a PP complement to e PP/NP [NP] [NP] PP Interpolation: From PP to S [S] Local discharge of trace assumption NP\S S
Interpolation: extraction of an NP from a PP complement Mary books John to e NP NP NP [NP] PP/NP [NP] gives PP [NP] (NP\S)/PP/NP [NP] [PP] (NP\S)/PP (NP\S) S [S] (NP\S) Mary John gives books to S
Formal representation of the inference rules • Rules for the three types of operations on PPTs -- linking, stretching, and interpolation -- are from proof trees to proof trees. • These operations are specified by inference rules that take the form of l-operations, where the body of the l-term is itself a proof. • A version of typed label-selective l-calculus (Garrigue and Ait-Kaci 1994) • Arguments have both symbol and numeric labels “the use of labels for argument selection enhances clarity and obviates the need of argument-shuffling combinators” Garrigue and Ait-Kaci 1994
Formal representation of the inference rules • Although arguments must be applied along the correct channels, it does not matter in what order they are applied -- two reductions of Bob likes Hazel • Stretching and linking can also be handled by b- reduction, where the proof tree to be stretched at a node becomes an abstraction over an inference rule -- higher-order b-reduction. • A similar higher-order b-reduction is used to handle interpolation. The inference rule abstraction for interpolation is done during the course of building the basic PPT.
CG (PPT) is more powerful than CG (AB): A strictly non-context-free language generated by CG (PPT) a a (S/C)/B [B] [C] (S/C*)/C/B/(S/C) [S/C] [B] [C] (S/C) (S/C*)/C/B S (S/C*)/C (S/C*) b c B C L is the language generated by this CG(PPT) L Ç { a* b* c*} = {an bn cn | n ³ 1}
CG(PPT) and crossing dependencies a e a e S/B [B] [B] (S/B*)/B/(S/B) [S/B] [B] [B] S (S/B*)/B S/B (S/B) S Local discharge of (S/B*)/B the trsce assumption (S/B*) L = { an bn | n ³ 1} b The dependencies are as follows. B a a a . . . b b b