470 likes | 593 Views
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 28– Grammar; Constituency, Dependency). Pushpak Bhattacharyya CSE Dept., IIT Bombay 21 st March, 2011. Grammar. A finite set of rules that generates only and all sentences of a language.
E N D
CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 28– Grammar; Constituency, Dependency) Pushpak BhattacharyyaCSE Dept., IIT Bombay 21st March, 2011
Grammar • A finite set of rules • that generates only and all sentences of a language. • that assigns an appropriate structural description to each one.
Grammatical Analysis Techniques Labeling the Constituents • Two main devices Breaking up a String • Sequential • Hierarchical • Transformational • Morphological • Categorial • Functional
Breaking up and Labeling • Sequential Breaking up • Sequential Breaking up and Morphological Labeling • Sequential Breaking up and Categorial Labeling • Sequential Breaking up and Functional Labeling • Hierarchical Breaking up • Hierarchical Breaking up and Categorial Labeling • Hierarchical Breaking up and Functional Labeling
Sequential Breaking up • That student solved the problems. student that + + solve + + + + s ed the problem
Sequential Breaking up and Morphological Labeling • That student solved the problems. student that solve s ed the problem word word stem affix affix word stem
Sequential Breaking up and Categorial Labeling • This boy can solve the problem. boy this can solve the problem N Det Aux Det V N • They called her a taxi. They call ed her a taxi Pron V Pron Det N Affix
Sequential Breaking up and Functional Labeling They called her a taxi Subject Verbal Direct Object Indirect Object her a taxi They called Indirect Object Direct Object Subject Verbal
Hierarchical Breaking up • Old men and women Old men and women Old men and women men and women and women Old men Old men and women Old men
Hierarchical Breaking up and Categorial Labeling • Poor John ran away. S NP VP A N V Adv Poor John ran away
Hierarchical Breaking up and Functional Labeling • Immediate Constituent (IC) Analysis • Construction types in terms of the function of the constituents: • Predication (subject + predicate) • Modification (modifier + head) • Complementation (verbal + complement) • Subordination (subordinator + dependent unit) • Coordination (independent unit + coordinator)
Predication • [Birds]subject [fly]predicate S Subject Predicate Birds fly
Modification • [A]modifier [flower]head • John [slept]head [in the room]modifier S Subject Predicate John Head Modifier In the room slept
Complementation • He [saw]verbal [a lake]complement S Subject Predicate He Verbal Complement saw alake ICON 2004 Tutorial
Subordination • John slept [in]subordinator [the room]dependentunit S Subject Predicate John Head Modifier Subordinator Dependent Unit slept in theroom
Coordination • [John came in time] independent unit [but]coordinator [Mary was not ready] independent unit S Independent Unit Independent Unit Coordinator John came in time but Mary was not ready
An Example S • In the morning, the sky looked much brighter. Modifier Head Subject Predicate Subordinator DU Modifier Modifier Head Head Verbal Complement Modifier Head In the morning, the sky looked much brighter
Hierarchical Breaking up and Categorial / Functional Labeling • Hierarchical Breaking up coupled with Categorial /Functional Labeling is a very powerful device. • But there are ambiguities which demand something more powerful. • E.g., Love of God • Someone loves God • God loves someone
Hierarchical Breaking up Categorial Labeling Functional Labeling Love of God Love of God Head Modifier Noun Phrase Prepositional Phrase Sub DU love of God love of God
Types of Generative Grammar • Finite State Model (sequential) • Phrase Structure Model (sequential + hierarchical) + (categorial) • Transformational Model (sequential + hierarchical + transformational) + (categorial + functional)
Phrase Structure Grammar (PSG) A phrase-structure grammar G consists of a four tuple (V, T, S, P), where • V is a finite set of alphabets (or vocabulary) • E.g., N, V, A, Adv, P, NP, VP, AP, AdvP, PP, student, sing, etc. • T is a finite set of terminal symbols: T V • E.g., student, sing,etc. • S is a distinguished non-terminal symbol, also called start symbol: S V • P is a set of productions.
Noun Phrases • John • the student • the intelligent student NP NP NP N N N Det Det AdjP student John student the the intelligent
Noun Phrase • his first five PhD students NP N Quant Det Ord N students five his first PhD
Noun Phrase • The five best students of my class NP PP Quant Det AP N five the best students of my class
Verb Phrases • can sing • can hit the ball VP VP V Aux NP Aux V can sing the ball can hit
Verb Phrase • Can give a flower to Mary VP NP Aux V PP a flower can give to Mary
Verb Phrase • may make John the chairman VP NP Aux V NP John may make thechairman
Verb Phrase • may find the book very interesting VP NP Aux V AP veryinteresting thebook may find
Prepositional Phrases • in the classroom • near the river PP PP NP NP P P in near theclassroom theriver
Adjective Phrases • intelligent • very honest • fond of sweets AP AP AP A PP Degree A A very fond honest ofsweets intelligent
Adjective Phrase • very worried that she might have done badly in the assignment AP Degree A S’ very worried that she might have done badly in the assignment
Phrase Structure Rules • The boy hit the ball. • Rewrite Rules: (i) S NP VP (ii) NP Det N (iii) VP V NP • Det the • N man, ball • V hit • We interpret each rule X Yas the instruction rewrite X as Y.
Derivation • The boy hit the ball. • Sentence NP + VP (i) Det + N + VP (ii) Det + N + V + NP (iii) The + N + V + NP (iv) The + boy + V + NP (v) The + boy + hit + NP (vi) The + boy + hit + Det + N (ii) The + boy + hit + the + N (iv) The + boy + hit + the + ball(v)
PSG Parse Tree • The boy hit the ball. S NP VP Det N V NP the Det N boy hit the ball
PSG Parse Tree S • John wrote those words in the Book of Proverbs. NP VP PropN PP V NP NP P NP PP John in thosewords wrote ofproverbs thebook
Penn POS Tags • John wrote those words in the Book of Proverbs. [John/NNP ] wrote/VBD [ those/DT words/NNS ] in/IN [ the/DT Book/NN ] of/IN [ Proverbs/NNS ]
Penn Treebank • John wrote those words in the Book of Proverbs. (S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in (NP (NP-TTL (NP the Book) (PP of (NP Proverbs)))
PSG Parse Tree S • Official trading in the shares will start in Paris on Nov 6. NP VP NP PP Aux V PP PP NP N P AP A will start onNov6 inParis trading official in theshares
Penn POS Tags • Official trading in the shares will start in Paris on Nov 6. [ Official/JJ trading/NN ] in/IN [ the/DT shares/NNS ] will/MD start/VB in/IN [ Paris/NNP ] on/IN [ Nov./NNP 6/CD ]
Penn Treebank • Official trading in the shares will start in Paris on Nov 6. ( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start (PP-LOC in (NP Paris)) (PP-TMP on (NP (NP Nov 6)
Penn POS Tag Sset • Adjective: JJ • Adverb: RB • Cardinal Number: CD • Determiner: DT • Preposition: IN • Coordinating Conjunction CC • Subordinating Conjunction: IN • Singular Noun: NN • Plural Noun: NNS • Personal Pronoun: PP • Proper Noun: NP • Verb base form: VB • Modal verb: MD • Verb (3sg Pres): VBZ • Wh-determiner: WDT • Wh-pronoun: WP
Constituency Grammar • Categorical • Uses part of speech • Context Free Grammar (CFG) • Basic elements Phrases
Dependency Grammar • Functional • Context Free Grammar • Basic elements Units of Predication/ Modification/ Complementation/ Subordination/ Co-ordination
Bridge between Constituency and Dependency parse • Constituency uses phrases • Dependencies consist of Head-modifier combination • This is a cricket bat. • Cricket (Category: N, Functional: Adj) • Bat (Category: N, Functional: N) • For languages which are free word order we use dependency parser to uncover the relations between the words. • Raam ne Shaamkodekha. (Ram saw Shyam) • Shaamko Ram ne dekha. (Ram saw Shyam) • Case markers cling to the nouns they subordinate
Example of CG and DG output Birds Fly. S S NP VP Subject Predicate N V Birds fly Birds fly
Some probabilistic parsers and why they are used • Stanford, Collins, Charniack, RASP • Why Probabilistic parsers • For a single sentence we can have multiple parses. • Probability for the parse is calculated and then the parse with the highest probability is selected. • This is needed in many applications of NLP, that need parsing.