200 likes | 324 Views
Computational Linguistics Introduction. Context Free Grammars. Chomsky Hierarchy. Weak Equivalence. A grammar should generate all and only sentences in the language under investigation. Let H be language under investigation and G be the grammar we are developing.
E N D
Computational Linguistics Introduction Context Free Grammars CLINT-LN Parsing
Chomsky Hierarchy CLINT-LN Parsing
Weak Equivalence • A grammar should generate all and only sentences in the language under investigation. • Let H be language under investigation and G be the grammar we are developing. • The grammar should generate allsentences in the language, i.e. for any s in H, s is also in L(G). • The grammar should generate onlysentences in the language, i.e. for any s in L(G), s is also in H. CLINT-LN Parsing
G L(G) H All and Only = CLINT-LN Parsing
Overgeneration L(G) H CLINT-LN Parsing
Overgeneration • Basic Problem: L(G) is larger than H • There are sentences generated by the grammar that are not in H. • The “only” constraint is violated. • The grammar is too weak. • Example: a grammar which ignores number and gender CLINT-LN Parsing
Undergeneration H L(G) CLINT-LN Parsing
Undergeneration • Basic Problem: H is larger than L(G) • There are sentences in H that are not generated by the grammar. • The “all” constraint is violated. • The grammar is too strong. • Examples (for H = NL): • a grammar which lacks recursion; • a finite state grammar CLINT-LN Parsing
Weak and Strong Equivalence • A grammar/lexicon G generates a characteristic language L(G) • Grammars G1 and G2 are said to be weakly equivalent if L(G1) = L(G2) • A grammar G also assigns one or more phrase structures to any s in L(G) • Weakly equivalent grammars G1 and G2 are said to be strongly equivalent if in addition they assign identical phrase structures to any s in L(G1). CLINT-LN Parsing
A a A aA A a A Aa Weak Equivalence CLINT-LN Parsing
Appropriate Structure • The structure assigned by the grammar should be appropriate. • The structure should • Be understandable • Allow us to make generalisations. • Reflect the underlying meaning of the sentence. CLINT-LN Parsing
Ambiguity • A grammar is ambigious if it assigns two or more structures to the same sentence. • The grammar should not generate too many possible structures for the same sentence. • There is a tradeoff between ambiguity and clarity: too much detail can obscure the design principles. • Too little detail means that the grammar is undercommitted, CLINT-LN Parsing
Limitations of CF Grammars • Simple CF Grammars tend to overgenerate • The only mechanism available to control overgeneration is to invent new categories. • Proliferation of categories soon becomes intractable. Problems include • Size of grammar • Understandability of grammar CLINT-LN Parsing
Criteria for Evaluating Grammars • Does it undergenerate? • Does it overgenerate? • Does it assign appropriate structures to sentences it generates? • Is it simple to understand? How many rules are there? • Does it contain generalisations or special cases? • How ambiguous is it? How many structures for a given sentence? CLINT-LN Parsing
CF Phrase Structure Rules s → np vp np → d N vp → V vp → V np (4 rules) • Nice grammar – but it overgenerates • Solution – invent more categories nps, nppl, vpsn, vppl etc. CLINT-LN Parsing
s -> nps vps s -> nppl vppl nps -> DS NS nppl -> DPL NPL vps -> VS vps -> VS nps vps -> VS nppl vppl -> VPPL vppl -> VPPL nps vppl -> VPPL nppl (10 rules) CF Phrase Structure Ruleswith Number Agreement CLINT-LN Parsing
Constraints andInformation Structures • PATR2 is a special grammar formalism which augments CF rules with constraints between constituents. • Basic idea is that each constituent is associated with an information structure • We then express constraints between information structures. CLINT-LN Parsing
Example of a PATR rulewith Number Constraints constitutents s -> np vpconstraints <npnum> = <vpnum> <snum> = <npnum> CLINT-LN Parsing
Example of a Grammarwith Number Constraints s -> np vp <np num> = <vp num> <s num> = <np num> np -> D N <np num> = <D num> <D num> = <N num> vp -> V <vp num> = <V num> CLINT-LN Parsing
Summary • Pure CFGs become unwieldy when we try to constrain them to incorporate, for example, agreement information • PATR2 deals with this problem by associating information structures and constraints with each rule constituent. • Information structures are often referred to as F-structures. CLINT-LN Parsing