1 / 20

Computational Linguistics Introduction

Computational Linguistics Introduction. Context Free Grammars. Chomsky Hierarchy. Weak Equivalence. A grammar should generate all and only sentences in the language under investigation. Let H be language under investigation and G be the grammar we are developing.

jorden-park
Download Presentation

Computational Linguistics Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Linguistics Introduction Context Free Grammars CLINT-LN Parsing

  2. Chomsky Hierarchy CLINT-LN Parsing

  3. Weak Equivalence • A grammar should generate all and only sentences in the language under investigation. • Let H be language under investigation and G be the grammar we are developing. • The grammar should generate allsentences in the language, i.e. for any s in H, s is also in L(G). • The grammar should generate onlysentences in the language, i.e. for any s in L(G), s is also in H. CLINT-LN Parsing

  4. G L(G) H All and Only = CLINT-LN Parsing

  5. Overgeneration L(G) H CLINT-LN Parsing

  6. Overgeneration • Basic Problem: L(G) is larger than H • There are sentences generated by the grammar that are not in H. • The “only” constraint is violated. • The grammar is too weak. • Example: a grammar which ignores number and gender CLINT-LN Parsing

  7. Undergeneration H L(G) CLINT-LN Parsing

  8. Undergeneration • Basic Problem: H is larger than L(G) • There are sentences in H that are not generated by the grammar. • The “all” constraint is violated. • The grammar is too strong. • Examples (for H = NL): • a grammar which lacks recursion; • a finite state grammar CLINT-LN Parsing

  9. Weak and Strong Equivalence • A grammar/lexicon G generates a characteristic language L(G) • Grammars G1 and G2 are said to be weakly equivalent if L(G1) = L(G2) • A grammar G also assigns one or more phrase structures to any s in L(G) • Weakly equivalent grammars G1 and G2 are said to be strongly equivalent if in addition they assign identical phrase structures to any s in L(G1). CLINT-LN Parsing

  10. A  a A  aA A  a A  Aa Weak Equivalence CLINT-LN Parsing

  11. Appropriate Structure • The structure assigned by the grammar should be appropriate. • The structure should • Be understandable • Allow us to make generalisations. • Reflect the underlying meaning of the sentence. CLINT-LN Parsing

  12. Ambiguity • A grammar is ambigious if it assigns two or more structures to the same sentence. • The grammar should not generate too many possible structures for the same sentence. • There is a tradeoff between ambiguity and clarity: too much detail can obscure the design principles. • Too little detail means that the grammar is undercommitted, CLINT-LN Parsing

  13. Limitations of CF Grammars • Simple CF Grammars tend to overgenerate • The only mechanism available to control overgeneration is to invent new categories. • Proliferation of categories soon becomes intractable. Problems include • Size of grammar • Understandability of grammar CLINT-LN Parsing

  14. Criteria for Evaluating Grammars • Does it undergenerate? • Does it overgenerate? • Does it assign appropriate structures to sentences it generates? • Is it simple to understand? How many rules are there? • Does it contain generalisations or special cases? • How ambiguous is it? How many structures for a given sentence? CLINT-LN Parsing

  15. CF Phrase Structure Rules s → np vp np → d N vp → V vp → V np (4 rules) • Nice grammar – but it overgenerates • Solution – invent more categories nps, nppl, vpsn, vppl etc. CLINT-LN Parsing

  16. s -> nps vps s -> nppl vppl nps -> DS NS nppl -> DPL NPL vps -> VS vps -> VS nps vps -> VS nppl vppl -> VPPL vppl -> VPPL nps vppl -> VPPL nppl (10 rules) CF Phrase Structure Ruleswith Number Agreement CLINT-LN Parsing

  17. Constraints andInformation Structures • PATR2 is a special grammar formalism which augments CF rules with constraints between constituents. • Basic idea is that each constituent is associated with an information structure • We then express constraints between information structures. CLINT-LN Parsing

  18. Example of a PATR rulewith Number Constraints constitutents s -> np vpconstraints <npnum> = <vpnum> <snum> = <npnum> CLINT-LN Parsing

  19. Example of a Grammarwith Number Constraints s -> np vp <np num> = <vp num> <s num> = <np num> np -> D N <np num> = <D num> <D num> = <N num> vp -> V <vp num> = <V num> CLINT-LN Parsing

  20. Summary • Pure CFGs become unwieldy when we try to constrain them to incorporate, for example, agreement information • PATR2 deals with this problem by associating information structures and constraints with each rule constituent. • Information structures are often referred to as F-structures. CLINT-LN Parsing

More Related