1 / 18

Natural Language Parsing: Syntax and Parsing in Artificial Intelligence

Learn about the process of parsing natural language syntax using formal grammars and language models. Understand the challenges of bottom-up and top-down parsing, as well as ambiguity and attachment problems. Explore the Earley Parser as a solution for efficient parsing.

czabel
Download Presentation

Natural Language Parsing: Syntax and Parsing in Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 74.419 Artificial Intelligence 2004 Natural Language Processing- Syntax and Parsing - Language Syntax Parsing

  2. Natural Language - General • "Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs." [Russell & Norvig, p.651] • (Natural) Language characterized by • a sign system • common or shared set of signs • a systematic procedure to produce combinations of signs • a shared meaning of signs and combinations of signs

  3. Natural Language - Parsing • Natural Language syntactically described by a formal language, usually a (context-free) grammar: • the start-symbol S≡sentence • non-terminals ≡syntactic constituents • terminals ≡lexical entries/ words • rules ≡grammar rules • Parsing • derive the syntactic structure of a sentence based on a language model (grammar) • construct a parse tree, i.e. the derivation of the sentence based on the grammar (rewrite system)

  4. Sample Grammar • Grammar (S, NT, T, P) – • Part-of-Speech NT, syntactic Constituents  NT • S → NP VP statement • S →Aux NP VP question • S → VP command • NP →Det Nominal • NP →Proper-Noun • Nominal →Noun | Noun Nominal | Nominal PP • VP →Verb | Verb NP | Verb PP | Verb NP PP • PP →Prep NP • Det→ that | this | a • Noun→book | flight | meal | money • Proper-Noun→ Houston | American Airlines | TWA • Verb→ book | include | prefer • Aux→ does • Prep→ from | to | on • Task: Parse"Does this flight include a meal?"

  5. Sample Parse Tree • Task: Parse"Does this flight include a meal?" • S • Aux NP VP • Det Nominal Verb NP • Noun Det Nominal • does this flight include a meal

  6. Bottom-up – from word-nodes to sentence-symbol • Top-down Parsing – from sentence-symbol to words • S • Aux NP VP • Det Nominal Verb NP • Noun Det Nominal • does this flight include a meal Bottom-up and Top-down Parsing

  7. Problems with Bottom-up and Top-down Parsing Problems with left-recursive rules like NP → NP PP: don’t know how many times recursion is needed Pure Bottom-up or Top-down Parsing is inefficient because it generates and explores too many structures which in the end turn out to be invalid (several grammar rules applicable → ‘interim’ ambiguity). Combine top-down and bottom-up approach: Start with sentence; use rules top-down (look-ahead); read input; try to find shortest path from input to highest unparsed constituent (from left to right). → Chart-Parsing / Earley-Parser

  8. General Problems in Parsing - Ambiguity Ambiguity “One morning, I shot an elephant in my pajamas. How he got into my pajamas, I don’t know.” Groucho Marx syntactical/structural ambiguity – several parse trees are possible e.g. above sentence semantic/lexical ambiguity – several word meanings e.g. bank (where you get money) and (river) bank even different word categories possible (interim) e.g. “He books the flight.” vs. “The books are here.“ or “Fruit flies from the balcony” vs. “Fruit flies are on the balcony.”

  9. General Problems in Parsing - Attachment Attachment in particular PP (prepositional phrase) binding; often referred to as ‘binding problem’ “One morning, I shot an elephant in my pajamas.” (S ... (NP (PNoun I)(VP (Verbshot) (NP (Det an (Nominal (Noun elephant))) (PPin my pajamas))...) rule VP → Verb NP PP (S ... (NP (PNoun I)) (VP (Verb shot) (NP (Det an) (Nominal (Nominal (Nounelephant) (PPin my pajamas)... ) rule VP → Verb NPand NP → Det Nominaland Nominal → Nominal PPand Nominal → Noun

  10. Chart Parsing / Early Algorithm Earley-Parser based on Chart-Parsing Essence: Integrate top-down and bottom-up parsing. Keep recognized sub-structures (sub-trees) for shared use during parsing. Top-down: Start with S-symbol. Generate all applicable rules for S. Go further down with left-most constituent in rules and add rules for these constituents until you encounter a left-most node on the RHS which is to a word category. Bottom-up: Read input word and compare. If word matches, mark as recognized and move on to the next category in the rule(s).

  11. Chart Parsing 1 Chart Sequence of n input words; n+1 nodes marked 0 to n. Arc indicates recognized part of rule. The • marks indicates recognized constituents in rules. Jurafsky & Martin, Figure 10.15, p.380

  12. Chart Parsing / Earley Parser 1 Chart Sequence of input words; n+1 nodes marked 0 to n. State of chart represents possible rules and recognized constituents. Interim state S → • VP, [0,0] • top-down look at rule S → VP • nothing of RHS of rule yet recognized (• is far left) • arc at beginning, no coverage (cover no input word; beginning of arc at 0 and end of arc at 0)

  13. Chart Parsing / Earley Parser 2 Interim states NP → Det • Nominal, [1,2] • top-down look at rule NP → Det • Nominal • Det recognized (• after Det) • arc covers on input word which is between node 1 and node 2 • look next for Nominal NP → Det Nominal • , [1,3] • if Nominal recognized, move • after Nominal • move end of arc to cover Nominal • structure is completely recognized; arc inactive; mark NP recognized in other (higher) rules.

  14. Earley Algorithm - Functions predictor generates new rules for partly recognized RHS with constituent right of • (top-down mode) scanner if word category (POS) is found right of the • , the Scanner reads the next input word and adds a rule for it to the chart (bottom-up mode) completer if rule is completely recognized (the • is far right), the recognition state of earlier rules in the chart advances: the • is moved over the recognized constituent (bottom-up recognition).

  15. Additional References Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000. (Chapters 9 and 10) Earley Algorithm Jurafsky & Martin, Figure 10.16, p.384 Earley Algorithm - Examples Jurafsky & Martin, Figures 10.17 and 10.18

More Related