1 / 45

CS626-460: Language Technology for the Web/Natural Language Processing

CS626-460: Language Technology for the Web/Natural Language Processing. Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with major contributions from Dr. Rajat Mohanty). Syntax.

pia
Download Presentation

CS626-460: Language Technology for the Web/Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS626-460: Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with major contributions from Dr. Rajat Mohanty)

  2. Syntax • Syntax is the study of the combination of words into phrases, clauses and sentences. • Syntax describes how sentences and their constituents are structured.

  3. Grammar • A finite set of rules • that generates only and all sentences of a language. • that assigns an appropriate structural description to each one.

  4. Grammatical Analysis Techniques • Two main devices Breaking up a String Labeling the Constituents • Sequential • Hierarchical • Transformational • Morphological • Categorial • Functional

  5. Hierarchical Breaking up and Categorial Labeling • Poor John ran away. S NP VP A N V Adv Poor John ran away

  6. Hierarchical Breaking up and Functional Labeling • Immediate Constituent (IC) Analysis • Construction types in terms of the function of the constituents: • Predication (subject + predicate) • Modification (modifier + head) • Complementation (verbal + complement) • Subordination (subordinator + dependent unit) • Coordination (independent unit + coordinator)

  7. An Example S • In the morning, the sky looked much brighter. Modifier Head Subject Predicate Subordinator DU Modifier Modifier Head Head Verbal Complement Modifier Head In the morning, the sky looked much brighter

  8. Noun Phrases • John • the student • the intelligent student NP NP NP N N N Det Det AdjP student John student the the intelligent

  9. Phrases

  10. Noun Phrase • his first five PhD students NP N Quant Det Ord N students five his first PhD

  11. Noun Phrase • The five best students of my class NP PP Quant Det AP N five the best students of my class

  12. Verb Phrases • can sing • can hit the ball VP VP V Aux NP Aux V can sing the ball can hit

  13. Verb Phrase • Can give a flower to Mary VP NP Aux V PP a flower can give to Mary

  14. Verb Phrase • may make John the chairman VP NP Aux V NP John may make thechairman

  15. Verb Phrase • may find the book very interesting VP NP Aux V AP veryinteresting thebook may find

  16. Prepositional Phrases • in the classroom • near the river PP PP NP NP P P in near theclassroom theriver

  17. Adjective Phrases • intelligent • very honest • fond of sweets AP AP AP A PP Degree A A very fond honest ofsweets intelligent

  18. Adjective Phrase • very worried that she might have done badly in the assignment AP Degree A S’ very worried that she might have done badly in the assignment

  19. A segment of English Grammar • S’(C) S • S{NP/S’} VP • VP(AP+) V (AP+) ({NP/S’}) (AP+) (PP+) (AP+) • NP(D) (AP+) N (PP+) • PPP NP • AP(AP) A

  20. PSG Parse Tree S • John wrote those words in the Book of Proverbs. NP VP PropN PP V NP NP P NP PP John in thosewords wrote ofproverbs thebook

  21. Penn Treebank • John wrote those words in the Book of Proverbs. (S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in (NP (NP-TTL (NP the Book) (PP of (NP Proverbs)))

  22. PSG Parse Tree S • Official trading in the shares will start in Paris on Nov 6. NP VP NP PP Aux V PP PP NP N P AP A will start onNov6 inParis trading official in theshares

  23. Penn POS Tags • Official trading in the shares will start in Paris on Nov 6. [ Official/JJ trading/NN ] in/IN [ the/DT shares/NNS ] will/MD start/VB in/IN [ Paris/NNP ] on/IN [ Nov./NNP 6/CD ]

  24. Penn Treebank • Official trading in the shares will start in Paris on Nov 6. ( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start (PP-LOC in (NP Paris)) (PP-TMP on (NP (NP Nov 6)

  25. Penn POS Tag Sset • Adjective: JJ • Adverb: RB • Cardinal Number: CD • Determiner: DT • Preposition: IN • Coordinating Conjunction CC • Subordinating Conjunction: IN • Singular Noun: NN • Plural Noun: NNS • Personal Pronoun: PP • Proper Noun: NP • Verb base form: VB • Modal verb: MD • Verb (3sg Pres): VBZ • Wh-determiner: WDT • Wh-pronoun: WP

  26. Basic Parsing Strategy

  27. A Fragment of English Grammar S  NP VP VP  V NP NP  NNP | ART N NNP  Ram V  ate | saw ART  a | an | the N  rice | apple | movie

  28. Derivation • S is a special symbol called start symbol. S => NP VP (rewrite S) => NNP VP (rewrite NP) => Ram VP (rewrite NNP) => Ram V NP (rewrite VP) => Ram ate NP (rewrite V) => Ram ate ART N (rewrite NP) => Ram ate the N (rewrite ART) => Ram ate the rice (rewrite N) Multiple Choice Points

  29. Two Strategies : Top-Down & Bottom-Up • Top down : Start with S and generate the sentence. • Bottom up : Start with the words in the sentence and use the rewrite rules backwards to reduce the sequence of symbols to produce S. • Previous slide showed top-down strategy.

  30. Bottom-Up Derivation Ram ate the rice => NNP ate the rice (rewrite Ram) => NNP V the rice (rewrite ate) => NNP V ART rice (rewrite the) => NNP V ART N (rewrite rice) => NP V ART N (rewrite NNP) => NP V NP (rewrite ART N) => NP VP (rewrite V NP) => S

  31. Parsing Algorithm A procedure that “searches” through the grammatical rules to find a combination that generates a tree which stands for the structure of the sentence

  32. Top-Down Parsing (using A*) • DFS on the AND-OR graph • Data structures: • Open List (OL): Nodes to be expanded • Closed List (CL): Expanded Nodes • Input List (IL): Words of sentence to be parsed • Moving Head (MH): Walks over the IL

  33. Trace of Top-Down Parsing Initial Condition (T0) OL CL (empty) IL S Ram ate the rice MH

  34. Trace of Top-Down Parsing T1: OL CL IL MH NP VP S Ram ate the rice

  35. Trace of Top-Down Parsing T2: OL CL IL MH NNP ART N VP S NP Ram ate the rice

  36. Trace of Top-Down Parsing T3: OL CL IL ART N VP S NP NNP Ram ate the rice MH (portion of Input consumed)

  37. Trace of Top-Down Parsing T4: OL CL IL N VP S NP NNP ART* Ram ate the rice MH (* indicates ‘useless’ expansion)

  38. Trace of Top-Down Parsing T5: OL CL IL VP S NP NNP ART* N* Ram ate the rice MH

  39. Trace of Top-Down Parsing T6: OL CL IL V NP S NP NNP ART* N* Ram ate the rice MH

  40. Trace of Top-Down Parsing T7: OL CL IL NP S NP NNP ART* N* V Ram ate the rice MH

  41. Trace of Top-Down Parsing T8: OL CL IL NNP ART N S NP NNP ART* N* V NP Ram ate the rice MH

  42. Trace of Top-Down Parsing T9: OL CL IL ART N S NP NNP ART* N* V NNP* Ram ate the rice MH

  43. Trace of Top-Down Parsing T10: OL CL IL N S NP NNP ART* N* V NNP ART Ram ate the rice MH

  44. Trace of Top-Down Parsing T11: OL CL IL S NP NNP ART* N* V NNP ART N Ram ate the rice MH Successful Termination: OL empty AND MH at the end of IL.

  45. Bottom-Up Parsing Basic idea: • Refer to words from the lexicon. • Obtain all POSs for each word. • Keep combining until S is obtained. (to be continued)

More Related