LEXICALIZATION

LEXICALIZATION AND CATEGORIAL GRAMMARS: A STORY BAR-HILLEL MIGHT HAVE LIKED ARAVIND K. JOSHI UNIVERSITY OF PENNSYLVANIA PHILADELPHIA, PA 19104 USA June 1995

Outline • Introduction • Lexicalization • Weak Lexicalization and Strong Lexicalization • Strong lexicalization and Lexicalized Tree-Adjoining Grammars (LTAGs) • Strong Lexicalization and Categorial Grammars (CG) • Basic partial proof trees • Inference rules from proof trees to proof trees • Formal characterization of the inference rules • Relevance to parsing • Summary

Introduction • Equivalence of categorial grammars and context-free grammars (Bar-Hillel, Gaifman and Shamir 1960) • Fate of grammars in the 60’s that were shown to be equivalent or conjectured to be equivalent to CFGs • Non-transformational or minimally transformational grammars of the 70’s, 80’s and 90’s - GPSG, LFG, HPSG, various types of CGs, LTAG and others - GB, Minimalist Theory - LTAGs are, in a sense, transformational, reminiscent of ‘generalized transformations’ in the earliest formulation of transformational grammars

Introduction • Bar-Hillel et al. 1960 suggested that CFGs (and by implication grammars equivalent to CFGs) can be used for the so-called ‘kernel’ sentences of Chomsky • Categorial Grammars with partial proof trees CG (PPT), the system presented here, can be thought of as related to this suggestion of Bar-Hillel et al. 1960 • This relationship and Bar-Hillel’s strong interest in comparative studies of formal grammars are the basis for the second half of the title -- A story Bar-Hillel might have liked

Related work • Proof trees, Morrill et al. 1990 • Description trees, Vijayshanker 1993 • HPSG compilation into LTAG trees, Kasper et al. 1992/1995

Lexicalization • A grammar G is a lexicalized grammar if it consists of • a finite set of structures (strings, trees, dags, for example), each structure being associated with a lexical item, called its anchor • a finite set of operations for composing these structures • A grammar G strongly lexicalizes another grammar G’ if G is a lexicalized grammar and the structural descriptions (trees, for example) of G and G’ are exactly the same

Lexicalized grammars • Context-free grammar (CFG) CFG, G S ® NP VP NP ® Harry VP ® V NP VP ® VP ADV NP ® peanuts V ® likes ADV ® passionately (Non-lexical) S (Lexical) NP VP Harry VP ADV passionately NP V likes peanuts

CFGs can weakly lexicalize CFGs but not strongly • Greibach Normal Form (GNF) CFG rules are of the form A ® a B1 B2 ... Bn A ® a This lexicalization gives the same set of strings but not the same set of trees, i.e., the same set of structural descriptions. Hence, it is a weak lexicalization. Converting a CFG to a categorial grammar (CG) gives only weak lexicalization and not necessarily a strong lexicalization. Ajdukiewicz and Bar-Hillel Categorial Grammars CG(AB) weakly lexicalize CFGs but not strongly.

Strong lexicalization of CFGs • Same set of strings and same set of trees or structural descriptions. • Tree substitution grammars • Increased domain of locality • Substitution as the combining operation

CG(AB) cannot strongly lexicalize CFGs CFG, G: CG(AB), G’ S ® SS a: S a: S/S S ® a • G’ weakly lexicalizes G but not strongly. • Not all trees of G are proof trees of G’ (assuming appropriate relabeling of nodes ). Note: Adding function composition helps in this example but, in general, it will not help.

Strong lexicalization -- Tree substitution grammars CFG, G S ® NP VP NP ®Harry VP ® V NP NP ® peanuts V ® likes S a3 NP a2 NP TSG, G’ a1 Harry peanuts NP¯ VP V NP¯ likes

TSGs cannot strongly lexicalize CGFs • Formal insufficiency of TSG G: S ®SS (non-lexical) S ® a (lexical) S S S TSG: G’: a1: a2: a3: S S¯ S¯ S a a a

TSGs cannot lexicalize CFGs S S S TSG: G’: a1: a2: t: S S S¯ S S S¯ S S a a S S a a S a a3: S S a a a G’ can generate all strings of G but not all trees of G. TSGs cannot strongly lexicalize CFGs. Thus substitution alone is not enough.

TSGs are also linguistically inadequate • Linguistic inadequacy of TSG G: S ® NP VP VP ® VP ADV VP ® V NP NP ® Harry/ peanuts V ® likes ADV ® passionately G’: a1: S a2: NP a3: NP a4: VP NP¯ VP Harry peanuts VP¯ ADV passionately V NP¯ likes G’ is inadequate. It cannot achieve recursion on VP.

Linguistic inadequacy of TSGs a2: NP a3: NP a4: VP G’’: a1: S VP¯ Harry peanuts ADV NP¯ VP V NP¯ passionately likes a5: S VP a6: NP¯ VP V NP¯ Even when a CFG can be lexicalized by substitution alone, the lexical anchors may not be linguistically appropriate. VP¯ likes ADV passionately

TSGs with substitution and adjoining -- LTAGs G: S ® SS S ® a S S S TSG: G’: a1: a2: a3: a S S* S* S g: S a a S S Adjoining a2 to a3 at the S node, the root node and then adjoining a1 to the S node of a2 , the left daughter of the root node, we have g. a S S a a LTAGs strongly lexicalize CFGs. Adjoining is crucial for lexicalization.

Adjunction permits appropriate choice of lexical anchors a2: NP a3: NP a4: VP G3: a1: S VP* Harry peanuts ADV NP* VP passionately V NP¯ likes A tree rooted in S and anchored in ‘passionately’ is not needed. Lexical anchors as functors.

Adjoining X g: b: X X* g’: X Summary of lexicalization LTAGs strongly lexicalize CFGs. Adjoining and, therefore, LTAGs arise out of lexicalization of CFGs. X*

Lexicalized Tree-Adjoining Grammars (LTAGs) • Finite set of elementary trees anchored on lexical items • Elementary trees • Initial trees • Auxiliary trees • Operations • Substitution • Adjoining • Derivation • Derivation tree -- How elementary trees are put together. • Derived tree

Properties of LTAGs • Localization of dependencies • Syntactic locality • Agreement • Subcategorization • Filler-gap • Word order • Local scrambling • Long distance scrambling-- movement across clauses • Word clusters (flexible idioms) -- non-compositionality • Function -- argument

Properties of LTAGs • Extended domain of locality (EDL) • Factoring recursion from the domain of dependencies (FRD) • All interesting properties of LTAG follow from EDL and FRD • Mathematical - Computational: mild context-sensitivity, polynomial parsability, semi-linearity, etc. • Linguistic

Strong lexicalization: EDL, FRD Strong Lex-EDL, FRD CFG LTAG Weak equivalence Weak equivalence? CG (AB) CG (PPT) Strong Lex-EDL, FRD • CG (AB), although weakly equivalent to CFG, do not lexicalize CFG. CG (AB) has function application only. • In analogy to LTAG, we work with larger structures, Partial Proof Trees (PPT) and inference rules from proof trees to proof trees. • CG (PPT) has properties (linguistic and mathematical) similar to LTAG. lex-cg-June 95

Strong lexicalization: EDL, FRD likes (NP\S)/NP [NP] [NP] (NP\S) S • Main idea • Each lexical item is associated with one or more (basic) partial proof trees (BPPT) obtained by unfolding arguments. • B(PPT) is the (finite) set of BPPTs -- the set of basic types. • Informal description of the inference rule -- linking

How is B(PPT), finite set of basic partial proof trees, constructed? • Unfold arguments of the type associated with a lexical item in a CG (AB) by introducing assumptions. • No unfolding past an argument which is not an argument of the lexical item. • If a trace assumption is introduced while unfolding then it must be locally discharged, i.e., within the basic PPT which is being constructed. • While unfolding we can interpolate, say, from X to Y where X is a conclusion node and Y is an assumption node.

Unfolding arguments man apples the N NP NP/N [N] likes (NP\S)/NP [NP] NP [NP] (NP\S) S the man likes the apples • Linking conclusion nodes to assumption nodes

No unfolding past a non-argument passionately [(NP\S)] (NP\S)\ (NP*\S) (NP*\S) The subject NP marked by * is not an argument of ‘passionately’. This a property of the lexical item and thus it can be marked on the type assigned to the lexical item by CG (AB). • No unfolding past an argument marked by *. • Thus unfolded arguments are only those which are the arguments of the lexical item.

Stretching and linking -- First informal inference rule A proof tree can be stretched at any node. u v w X Y A proof tree to be stretched at the node X.

Stretching a proof tree at node X u v w u v w X X [X] Y X is the conclusion from v Y is the conclusion from u [X] w i.e., from u, assumption X and w Linking X to [X] we have the original proof tree. Y

Stretching and linking -- an example likes (NP\S)/NP [NP] [NP] (NP\S) S Stretching at the indicated node

Stretching and linking -- an example likes (NP\S)/NP [NP] [NP] (NP\S) (NP\S] S

Stretching and linking -- an example likes (NP\S)/NP [NP] [NP] passionately (NP\S) [(NP\S)] (NP\S)\ (NP*\S) (NP*\S) (NP\S)] Linking conclusion nodes to assumption nodes and assuming that appropriate proof trees are linked to the two NP assumption nodes, we have S John likes apples passionately

Introduction and discharge of trace assumption likes e (NP\S)/NP [NP] Trace assumption [NP] [NP] (NP\S) S Local discharge of the trace assumption. The appropriate directionality by convention. (NP\S) S Apples Mary likes

An example using a PPT with trace assumption, stretching and linking apples Mary likes e (NP\S)/NP [NP] NP NP [NP] [NP] John (NP\S) NP thinks S [NP] (NP\S)/S [S] (NP\S) S [S] (NP\S) S Apples John thinks Mary likes

An example of a PPT with trace assumption,stretching and linking John Mary calls e (NP\S)/NP [NP] NP NP [NP] [NP] (NP\S) everyday S (NP\S) [ (NP\S) ] (NP*\S)\(NP\S) (NP*\S) [NP\S] S Note: In a natural deduction type CG, a permutation operator is needed for this John Mary calls everyday the system.

Basic PPT for object relative clause meets e (NP\S)/NP [NP] wh Trace assumption [N] (N\N)/(NP\S) [NP] (NP\S) S Local discharge of the trace assumption. The appropriate directionality by convention. (NP\S) (N\N) N who Bill meets

Object relative clause, stretching and linking meets e (NP\S)/NP [NP] wh [N] (N\N)/(NP\S) [NP] today (NP\S) [(NP\S)] (NP*\S)\(NP\S) [(NP\S)] (NP*\S) S (NP\S) Note: In a natural deduction type CG, a permutation operator is needed for this (N\N) case, which adds power to the system. N who Bill meets today

An example -- John tries to walk John walk(inf) NP tries [NPpro] (NPpro)\Sinf Sinf [NP] (NP\S)/Sinf [Sinf] (NP\S) S Note: Subject NP is an argument for tries. Hence, unfolding continues past NP in (NP\S). John tries to walk

Raising verbs -- subject NP is not an argument seems (NP*\S)/(NP\Sinf) [(NP\Sinf)] (NP*\S) Subject NP is not an argument of seems. Hence, unfolding does not continue past NP in (NP*\S).

Interpolation in a basic PPT Another basic PPT for walk(inf) walk(inf) Interpolation from [NP] (NP\Sinf) (NP\Sinf) to (NP\S) [(NP\S)] S

Interpolation and linking John NP walk(inf) [NP] (NP\Sinf) seems (NP*\S)/(NP\Sinf) [(NP\Sinf)] [(NP\S)] (NP*\S) S John seems to walk

Interpolation: extraction of an NP under a PP complement gives [NP] (NP\S)/PP/NP [NP] [PP] (NP\S)/PP (NP\S) S

Interpolation: extraction of an NP under a PP complement to e PP/NP [NP] [NP] PP Interpolation: From PP to S [S] Local discharge of trace assumption NP\S S

Interpolation: extraction of an NP from a PP complement Mary books John to e NP NP NP [NP] PP/NP [NP] gives PP [NP] (NP\S)/PP/NP [NP] [PP] (NP\S)/PP (NP\S) S [S] (NP\S) Mary John gives books to S

Formal representation of the inference rules • Rules for the three types of operations on PPTs -- linking, stretching, and interpolation -- are from proof trees to proof trees. • These operations are specified by inference rules that take the form of l-operations, where the body of the l-term is itself a proof. • A version of typed label-selective l-calculus (Garrigue and Ait-Kaci 1994) • Arguments have both symbol and numeric labels “the use of labels for argument selection enhances clarity and obviates the need of argument-shuffling combinators” Garrigue and Ait-Kaci 1994

Formal representation of the inference rules • Although arguments must be applied along the correct channels, it does not matter in what order they are applied -- two reductions of Bob likes Hazel • Stretching and linking can also be handled by b- reduction, where the proof tree to be stretched at a node becomes an abstraction over an inference rule -- higher-order b-reduction. • A similar higher-order b-reduction is used to handle interpolation. The inference rule abstraction for interpolation is done during the course of building the basic PPT.

CG (PPT) is more powerful than CG (AB): A strictly non-context-free language generated by CG (PPT) a a (S/C)/B [B] [C] (S/C*)/C/B/(S/C) [S/C] [B] [C] (S/C) (S/C*)/C/B S (S/C*)/C (S/C*) b c B C L is the language generated by this CG(PPT) L Ç { a* b* c*} = {an bn cn | n ³ 1}

CG(PPT) and crossing dependencies a e a e S/B [B] [B] (S/B*)/B/(S/B) [S/B] [B] [B] S (S/B*)/B S/B (S/B) S Local discharge of (S/B*)/B the trsce assumption (S/B*) L = { an bn | n ³ 1} b The dependencies are as follows. B a a a . . . b b b

LEXICALIZATION

LEXICALIZATION

Presentation Transcript

Lexicalization of PCFGs

EPISTEMIC ADVERBS AT THE INTERFACE OF LEXICALIZATION AND GRAMMATICALIZATION Muriel Norde

VERB LEXICALIZATION IN RUSSIAN (on the example of verbs of motion)

Lexicalization patterns: semantic structure in lexical forms

Lexicalization Idioms Opacity