430 likes | 606 Views
组合范畴语法. 孙薇薇 @ 计算机科学技术研究所 @ 北京大学. Outline. Local, Nonlocal TAG, HPSG, LFG Combinatory categorial grammar. Grammar formalisms and linguistic theories. Linguistics aims to explain natural language: What is universal grammar? What are language-specific constraints?
E N D
组合范畴语法 孙薇薇@ 计算机科学技术研究所@北京大学
Outline • Local, Nonlocal • TAG, HPSG, LFG • Combinatory categorial grammar
Grammar formalisms and linguistic theories • Linguistics aims to explain natural language: • What is universal grammar? • What are language-specific constraints? • Formalisms are mathematical theories: • They provide a language in which linguistic theories can be expressed (like calculus for physics) • They define elementary objects (trees, strings, feature structures) and recursive operationswhich generate complex objects from simple objects. • They do impose linguistic constraints (e.g. on the kinds of dependencies they can capture)
Lexicalized formalisms • Lexicalized formalisms: • TAG, HPSG, LFG and CCG • The lexicon: • pairs words with elementary objects • specifies all language-specific information • The grammatical operations: • are universal • define (and impose constraints on) recursion
TAG, HPSG, LFG and CCG They describe different kinds of linguistic objects: • TAG: trees • LFG is a multi-level theory based on a projection architecture relating different types of linguistic objects • trees, feature structures, … • HPSG: typed feature structures • CCG: (syntactic and semantic) types
(Lexicalized) Tree-Adjoining Grammar • TAG is a tree-rewriting formalism: • TAG defines operations (substitution and adjunction) on trees. • The elementary objects in TAG are trees (not strings) • TAG is lexicalized: • Each elementary tree is anchored to a lexical item (word) • “Extended domain of locality”:The elementary tree contains all arguments of the anchor. • TAG requires a linguistic theory which specifies the shapeof these elementary trees. • TAG is mildly context-sensitive: • can capture Dutch crossing dependencies • but is still efficiently parseable
a1: X Y X Y Substitute Y X a3: a2: a1 a2 a3 TAG substitution (arguments) Derived tree: Derivation tree:
X Auxiliary tree Foot node X* X X X* a1 b1 TAG adjunction (modifiers) b1: Derived tree: a1: ADJOIN Derivation tree:
a1: S NP VP NP VBZ eats a2: b1: a3: VP NP NP VP* RB John tapas always A small TAG lexicon
NP S NP John NP VP a2 a3 tapas NP VBZ eats VP VP* RB always A TAG derivation a1: a1 NP NP b1: a3: a2: NP NP
S NP VP VP a2 a3 b1 NP NP VBZ VBZ eats eats tapas VP VP* RB always A TAG derivation a1 VP John tapas S NP b1 VP VP VP* RB John always
Head-Driven Phrase Structure Grammar (HPSG) • HPSG is a unification-/constraint-based theory of grammar • Syntactic/semantic constraints are uniformly denoted by signs, which are represented with feature structures • Two components of HPSG • Lexical entries represent word-specific constraints • elementary objects • Principles express generic grammatical regularities • grammatical operations
Sign • Sign is a formal representation of combinations of phonological forms, syntactic and semantic constraints phonological form signPHON string syntactic/semanticconstraints synsem local constraints local category syntactic category head MOD synsem HEAD CAT syntactic head valence SPR listSUBJ list COMPS list SYNSEM LOCAL VAL modifying constraints subcategorization frames CONT content nonlocal QUE listREL list SLASH list semantic representations NONLOCAL non-local dependencies DTRS dtrs daughter structures
Lexical entries • Lexical entries express word-specific constraints
Principles • Principles describe generic regularities of grammar • Not corresponding to construction rules • Head Feature Principle • The value of HEAD must be percolated from the head daughter • Valence Principle • Subcats not consumed are percolated to the mother • Immediate Dominance (ID) Principle • A mother and her immediate daughters must satisfy one of ID schemas • Many other principles: percolation of NONLOCAL features, semantics construction, etc.
Syntactic Structure • Lexical entries determine syntactic/semantic constraints of words Lexical entries HEAD nounSUBJ <>COMPS <> HEAD verbSUBJ <HEAD noun>COMPS <HEAD noun> HEAD nounSUBJ <>COMPS <> John saw Mary
Syntactic Structure • Principles determine generic constraints of grammar HEAD SUBJCOMPS 1 2 4 HEAD SUBJCOMPS < | > 1 Unification 3 2 3 4 HEAD nounSUBJ <>COMPS <> HEAD verbSUBJ <HEAD noun>COMPS <HEAD noun> HEAD nounSUBJ <>COMPS <> John saw Mary
Syntactic Structure • Principle application produces phrasal signs HEAD verbSUBJ <HEAD noun>COMPS <> HEAD nounSUBJ <>COMPS <> HEAD verbSUBJ <HEAD noun>COMPS <HEAD noun> HEAD nounSUBJ <>COMPS <> John saw Mary
Syntactic Structure • Recursive applications of principles HEAD verbSUBJ <>COMPS <> HEAD verbSUBJ <HEAD noun>COMPS <> HEAD nounSUBJ <>COMPS <> HEAD verbSUBJ <HEAD noun>COMPS <HEAD noun> HEAD nounSUBJ <>COMPS <> John saw Mary
Lexical-Functional Grammar (LFG) Two (basic) levels of representation: • C-structure: represents surface syntactic configurations • word order, annotated phrase-structures • trees • F-structure: represents abstract grammatical functions • SUBJ, OBJ, OBL, PRED, COMP, ADJ, … • AVM • F-structure approximates to basic predicate-argument structure, dependency representation
Outline • Local, Nonlocal • TAG, HPSG, LFG • Combinatory categorial grammar
Motivation for (C)CG • Only a “minimal” extension to CFGs → formalism is also well-understood from a logical standpoint • Transparent interface to (compositional) semantics • Cross-linguistic generalizations can be made easily • the same set of rules always apply • Flexible constituency
Combinatory Categorial Grammar • Categories: specify subcat lists of words/constituents. • Combinatory rules: specify how constituents can combine. • The lexicon: specifies which categories a word can have. • Derivations: spell out process of combining constituents.
CCG categories • Simple categories:NP, S, PP • Complex categories: functions which return a result when combined with an argument: VP or intransitive verb: S\NPTransitive verb: (S\NP)/NPAdverb: (S\NP)\(S\NP)PPs: ((S\NP)\(S\NP))/NP(NP\NP)/NP • Every category has a semantic interpretation
The combinatory rules • Function application: x.f(x) a f(a) X/YY X (>) Y X\Y X (<) • Function composition: x.f(x) y.g(y) x.f(g(x)) X/Y Y/Z X/Z(>B) Y\Z X\Y X/Z(<B) X/YY\Z X\Z(>Bx) Y/Z X\Y X/Z(<Bx) • Type-raising: a f.f(a) X T/(T\X) (>T) X T\(T/X) (<T)
Function application • Combines a function with its argument to yield aresult:(S\NP)/NP NP -> S\NPeats tapas eats tapasNPS\NP -> SJohn eats tapas John eats tapas • Used in all variants of categorial grammar
Type-raising and function composition • Type-raising: turns an argument into a function.Corresponds to case: NP -> S/(S\NP) (nominative)NP -> (S\NP)/((S\NP)/NP) (accusative) • Function composition: composes two functions (complex categories)(S\NP)/PP PP/NP -> (S\NP)/NPS/(S\NP) (S\NP)/NP -> S/NP
Type-raising and Composition • Wh-movement: • Right-node raising:
CCG: semantics • Every syntactic categoryand rule has a semantic counterpart:
The CCG lexicon • Pairs words with their syntactic categories(and semantic interpretation): eats (S\NP)/NPxy.eats’xyS\NPx.eats’x • The main bottleneck for wide-coverage CCG parsing
Summary • CCG is a lexicalized grammar formalism • “rules” of are extremely general, just like HPSG schemata • CCG is nearly context-free • Weakly equivalent to TAG • CCG has a flexible constituent structure • CCG has a transparent syntax-semantics interface • Every syntactic category and combinatory rule has a semantic interpretation • Movement or traces don’t exist • CCG rules are type-driven, not structure-driven • E.g. intransitive verbs and VPs are indistinguishable