150 likes | 282 Views
CSA2050: Introduction to Computational Linguistics. Evaluation Criteria for CFGs Limitations of CFGs Introduction to PATR2. Weak and Strong Equivalence. A grammar/lexicon G generates a characteristic language L(G) Grammars G1 and G2 are said to be weakly equivalent if L(G1) = L(G2)
E N D
CSA2050: Introduction to Computational Linguistics Evaluation Criteria for CFGs Limitations of CFGs Introduction to PATR2 CSA2050: CFG Limitations
Weak and Strong Equivalence • A grammar/lexicon G generates a characteristic language L(G) • Grammars G1 and G2 are said to be weakly equivalent if L(G1) = L(G2) • A grammar G also assigns one or more phrase structures to any s in L(G) • Weakly equivalent grammars G1 and G2 are said to be strongly equivalent if in addition they assign identical phrase structures to any s in L(G1). CSA2050: CFG Limitations
Weak Equivalence • A grammar should generate all and only sentences in the language under investigation. • Let H be language under investigation and G be the grammar we are developing. • The grammar should generate allsentences in the language, i.e. for any s in H, s is also in L(G). • The grammar should generate onlysentences in the language, i.e. for any s in L(G), s is also in H. CSA2050: CFG Limitations
Overgeneration • Basic Problem: L(G) is larger than H • There are sentences generated by the grammar that are not in H. • The “only” constraint is violated. • The grammar is too weak. • Example: a grammar which ignores number and gender CSA2050: CFG Limitations
Undergeneration • Basic Problem: H is larger than L(G) • There are sentences in H that are not generated by the grammar. • The “all” constraint is violated. • The grammar is too strong. • Example: a grammar which lacks recursion. CSA2050: CFG Limitations
Appropriate Structure • The structure assigned by the grammar should be appropriate. • The structure should • Be understandable • Allow us to make generalisations. • Reflect the underlying meaning of the sentence. CSA2050: CFG Limitations
Ambiguity • A grammar is ambigious if it assigns two or more structures to the same sentence. • The grammar should not generate too many possible structures for the same sentence. • There is a tradeoff between ambiguity and clarity: too much detail can obscure the design principles. • Too little detail means that the grammar is undercommitted, CSA2050: CFG Limitations
Limitations of CF Grammars • Simple CF Grammars tend to overgenerate • The only mechanism available to control overgeneration is to invent new categories. • Proliferation of categories soon becomes intractable. Problems include • Size of grammar • Understandability of grammar CSA2050: CFG Limitations
Criteria for Evaluating Grammars • Does it undergenerate? • Does it overgenerate? • Does it assign appropriate structures to sentences it generates? • Is it simple to understand? How many rules are there? • Does it contain generalisations or special cases? • How ambiguous is it? How many structures for a given sentence? CSA2050: CFG Limitations
CF Phrase Structure Rules s → np vp np → d N vp → V vp → V np (4 rules) • Nice grammar – but it overgenerates • Solution – invent more categories nps, nppl, vpsn, vppl etc. CSA2050: CFG Limitations
s -> nps vps s -> nppl vppl nps -> DS NS nppl -> DPL NPL vps -> VS vps -> VS nps vps -> VS nppl vppl -> VPPL vppl -> VPPL nps vppl -> VPPL nppl (10 rules) CF Phrase Structure Ruleswith Number Agreement CSA2050: CFG Limitations
Constraints andInformation Structures • PATR2 handles this problem by augmenting CF rules with constraints between constituents. • Basic idea is that each constituent of a CF rule is associated with an information structure • We then express constraints between information structures. CSA2050: CFG Limitations
Example of a PATR rulewith Number Constraints Rule s -> np vp <npnum> = <vpnum> <snum> = <npnum> CSA2050: CFG Limitations
Example of a Grammarwith Number Constraints s -> np vp <np num> = <vp num> <s num> = <np num> np -> D N <np num> = <D num> <D num> = <N num> vp -> V <vp num> = <V num> CSA2050: CFG Limitations
Summary • Pure CFGs become unwieldy when we try to constrain them to incorporate, for example, agreement information • PATR2 deals with this problem by associating information structures and constraints with each rule constituent. • Information structures are often referred to as F-structures. CSA2050: CFG Limitations