1 / 69

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics. Lecture 8 Giuseppe Carenini. Knowledge-Formalisms Map. State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models ). Morphology. Logical formalisms (First-Order Logics). Syntax.

cputnam
Download Presentation

CPSC 503 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 503Computational Linguistics Lecture 8 Giuseppe Carenini CPSC503 Winter 2008

  2. Knowledge-Formalisms Map State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Logical formalisms (First-Order Logics) Syntax Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) Semantics Pragmatics Discourse and Dialogue AI planners CPSC503 Winter 2008

  3. Today 1/10 • Finish CFG for Syntax of NL (problems) • Parsing • The Earley Algorithm • Partial Parsing: Chuncking • Dependency Grammars / Parsing CPSC503 Winter 2008

  4. Problems with CFGs • Agreement • Subcategorization CPSC503 Winter 2008

  5. Agreement • In English, • Determiners and nouns have to agree in number • Subjects and verbs have to agree in person and number • Many languages have agreement systems that are far more complex than this (e.g., gender). CPSC503 Winter 2008

  6. This dog Those dogs This dog eats You have it Those dogs eat *This dogs *Those dog *This dog eat *You has it *Those dogs eats Agreement CPSC503 Winter 2008

  7. S -> NP VP NP -> Det Nom VP -> V NP … SgS -> SgNP SgVP PlS -> PlNp PlVP SgNP -> SgDet SgNom PlNP -> PlDet PlNom PlVP -> PlV NP SgVP ->SgV NP … Possible CFG Solution OLD Grammar NEW Grammar Sg = singular Pl = plural CPSC503 Winter 2008

  8. CFG Solution for Agreement • It works and stays within the power of CFGs • But it doesn’t scale all that well (explosion in the number of rules) CPSC503 Winter 2008

  9. Subcategorization • Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments(see first table) • *John sneezed the book • *I prefer United has a flight • *Give with a flight CPSC503 Winter 2008

  10. Subcategorization • Sneeze: John sneezed • Find: Please find [a flight to NY]NP • Give: Give [me]NP[a cheaper fare]NP • Help: Can you help [me]NP[with a flight]PP • Prefer: I prefer [to leave earlier]TO-VP • Told: I was told [United has a flight]S • … CPSC503 Winter 2008

  11. So? • So the various rules for VPs overgenerate. • They allow strings containing verbs and arguments that don’t go together • For example: • VP -> V NP therefore Sneezed the book • VP -> V S therefore go she will go there CPSC503 Winter 2008

  12. Possible CFG Solution OLD Grammar NEW Grammar • VP -> V • VP -> V NP • VP -> V NP PP • … • VP -> IntransV • VP -> TransV NP • VP -> TransPPto NP PPto • … • TransPPto -> hand,give,.. This solution has the same problem as the one for agreement CPSC503 Winter 2008

  13. CFG for NLP: summary • CFGs cover most syntactic structure in English. • But there are problems (overgeneration) • That can be dealt with adequately, although not elegantly, by staying within the CFG framework. • There are simpler, more elegant, solutions that take us out of the CFG framework: LFG, XTAGS… see Chpt 15 “Features and Unification” CPSC503 Winter 2008

  14. Today 1/10 • Finish CFG for Syntax of NL (problems) • Parsing • The Earley Algorithm • Partial Parsing: Chuncking • Dependency Grammars / Parsing CPSC503 Winter 2008

  15. Nominal Nominal flight Parsing with CFGs • Valid parse trees • Sequence of words Assign valid trees: covers all and only the elements of the input and has an S at the top I prefer a morning flight Parser CFG CPSC503 Winter 2008

  16. CFG Parsing as Search • Search space of possible parse trees • S -> NP VP • S -> Aux NP VP • NP -> Det Noun • VP -> Verb • Det -> a • Noun -> flight • Verb -> left, arrive • Aux -> do, does Parsing: find all trees that cover all and only the words in the input • defines CPSC503 Winter 2008

  17. Nominal Nominal flight Constraints on Search • Sequence of words • Valid parse trees Search Strategies: • Top-down or goal-directed • Bottom-up or data-directed I prefer a morning flight Parser CFG (search space) CPSC503 Winter 2008

  18. Input: flight Top-Down Parsing • Since we’re trying to find trees rooted with an S (Sentences) start with the rules that give us an S. • Then work your way down from there to the words. CPSC503 Winter 2008

  19. …….. • …….. • …….. Next step: Top Down Space • When POS categories are reached, reject trees whose leaves fail to match all words in the input CPSC503 Winter 2008

  20. flight flight flight Bottom-Up Parsing • Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way. • Then work your way up from there. CPSC503 Winter 2008

  21. …….. • …….. • …….. flight flight flight flight flight flight flight Two more steps: Bottom-Up Space CPSC503 Winter 2008

  22. Top-Down vs. Bottom-Up • Top-down • Only searches for trees that can be answers • But suggests trees that are not consistent with the words • Bottom-up • Only forms trees consistent with the words • Suggest trees that make no sense globally CPSC503 Winter 2008

  23. So Combine Them • Top-down: control strategy to generate trees • Bottom-up: to filter out inappropriate parses • Top-down Control strategy: • Depth vs. Breadth first • Which node to try to expand next • Which grammar rule to use to expand a node • (left-most) • (textual order) CPSC503 Winter 2008

  24. Top-Down, Depth-First, Left-to-Right Search Sample sentence: “Does this flight include a meal?” CPSC503 Winter 2008

  25. Example “Does this flight include a meal?” CPSC503 Winter 2008

  26. Example “Does this flight include a meal?” flight flight CPSC503 Winter 2008

  27. Example “Does this flight include a meal?” flight flight CPSC503 Winter 2008

  28. Adding Bottom-up Filtering The following sequence was a waste of time because an NP cannot generate a parse tree starting with an Aux Aux Aux Aux Aux CPSC503 Winter 2008

  29. Aux Aux Aux Bottom-Up Filtering CPSC503 Winter 2008

  30. Problems with TD-BU-filtering • Left recursion • Ambiguity • Repeated Parsing • SOLUTION: Earley Algorithm • (once again dynamic programming!) CPSC503 Winter 2008

  31. (1) Left-Recursion These rules appears in most English grammars S -> S and S VP -> VP PP NP -> NP PP CPSC503 Winter 2008

  32. (2) Structural Ambiguity Three basic kinds: Attachment/Coordination/NP-bracketing “I shot an elephant in my pajamas” CPSC503 Winter 2008

  33. (3) Repeated Work • Parsing is hard, and slow. It’s wasteful to redo stuff over and over and over. • Consider an attempt to top-down parse the following as an NP “A flight from Indi to Houston on TWA” CPSC503 Winter 2008

  34. starts from…. • NP -> Det Nom • NP-> NP PP • Nom -> Noun • …… • fails and backtracks flight CPSC503 Winter 2008

  35. restarts from…. • NP -> Det Nom • NP-> NP PP • Nom -> Noun • fails and backtracks flight CPSC503 Winter 2008

  36. restarts from…. • fails and backtracks.. flight CPSC503 Winter 2008

  37. restarts from…. • Success! CPSC503 Winter 2008

  38. 4 • But…. • 3 • 2 • 1 CPSC503 Winter 2008

  39. Dynamic Programming Fills tables with solution to subproblems • Parsing: sub-trees consistent with the input, once discovered, are stored and can be reused • Does not fall prey to left-recursion • Stores ambiguous parse compactly • Does not do (avoidable) repeated work CPSC503 Winter 2008

  40. Earley Parsing O(N3) • Fills a table in a single sweep over the input words • Table is length N +1; N is number of words • Table entries represent: • Predicted constituents • In-progress constituents • Completed constituents and their locations CPSC503 Winter 2008

  41. For Next Time • Read 12.7 • Read in Chapter 13 (Parsing): 13.4.2, 13.5 • Optional: Read Chapter 16 (Features and Unification) – skip algorithms and implementation CPSC503 Winter 2008

  42. Final Project: Decision • Two ways: Select and NLP task / problem or a technique used in NLP that truly interests you • Tasks: summarization of …… , computing similarity between two terms/sentences (skim through the textbook) • Techniques: extensions / variations / combinations of what we saw in class – Max Entropy Classifiers or MM, Dirichlet Multinomial Distributions CPSC503 Winter 2008

  43. Final Project: goals (and hopefully contributions ) • Improve on a proposed solution by using a possibly more effective technique or by combining multiple techniques • Proposing a novel (minimally is OK) different solution.   • Apply a technique which has been used for nlp taskA to a different nlp taskB.  • Apply a technique to a different dataset or to a different language • Proposing a different evaluation measure CPSC503 Winter 2008

  44. Final Project: Examples / Ideas • Look on the course WebPage CPSC503 Winter 2008

  45. Today 1/10 • Finish CFG for Syntax of NL • Parsing • The Earley Algorithm • Partial Parsing: Chuncking CPSC503 Winter 2008

  46. States The table-entries are called states and express: • what is predicted from that point • what has been recognized up to that point Representation: dotted-rules + location S -> · VP [0,0] A VP is predicted at the start of the sentence NP -> Det · Nominal [1,2] An NP is in progress; the Det goes from 1 to 2 VP -> V NP ·[0,3] A VP has been found starting at 0 and ending at 3 CPSC503 Winter 2008

  47. Graphically • S -> · VP [0,0] • NP -> Det · Nominal [1,2] • VP -> V NP ·[0,3] CPSC503 Winter 2008

  48. Earley: answer • Answer found by looking in the table in the right place. • The following state should be in the final column: • S –> · [0,n] • i.e., an S state that spans from 0 to n and is complete. CPSC503 Winter 2008

  49. Earley Parsing Procedure • So sweep through the table from 0 to n in order, applying one of three operators to each state: • predictor: add top-down predictions to the chart • scanner: read input and add corresponding state to chart • completer: move dot to right when new constituent found • Results (new states) added to current or next set of states in chart • No backtracking and no states removed CPSC503 Winter 2008

  50. Predictor • Intuition: new states represent top-down expectations • Applied when non-part-of-speech non-terminals are to the right of a dot S --> • VP [0,0] • Adds new states to end of current chart • One new state for each expansion of the non-terminal in the grammar VP --> • V [0,0] VP --> • V NP [0,0] CPSC503 Winter 2008

More Related