990 likes | 1.23k Views
An Introduction to Natural Language Syntax. Rajat Mohanty rkm@cse.iitb.ac.in. CS-460/IT-632 Department of Computer Science and Engineering Indian Institute of Technology, Bombay. Outline. Grammatical Analysis Finite State Grammar Phrase Structure Grammar Transformational Grammar
E N D
An Introduction to Natural Language Syntax Rajat Mohanty rkm@cse.iitb.ac.in CS-460/IT-632 Department of Computer Science and Engineering Indian Institute of Technology, Bombay
Outline • Grammatical Analysis • Finite State Grammar • Phrase Structure Grammar • Transformational Grammar • Natural Language Phenomena
A Ubiquitous Task for NLP • Sequence labeling task can be at different levels. • In written text • Words • Phrases • Sentences • Paragraphs
Names for Labeling Tasks • Words: Part of Speech tagging • Phrases: Chunking • Sentences: Parsing • Paragraphs: Co-reference annotating
Example (Words: POS Tagging) <s> The dispute shows clearly the global power of Japan's financial titans.</s> <s>[ The/DT dispute/NN ] shows/VBZ clearly/RB [ the/DT global/JJ power/NN ] of/IN [ Japan/NNP 's/POS financial/JJ titans/NNS ] ./. </s>
Example (Phrases: Chunking) The dispute shows clearly the global power of Japan's financial titans
Example (Sentences: Parsing) ( (S (NP-SBJ The dispute) (VP shows (ADVP-MNR clearly) (NP (NP the global power) (PP of (NP (NP Japan 's) financial titans)))) .))
Parse Tree S NP VP Det NP V NP N Det JJ N PP The dispute the global power shows of Japan’s financial titans
Example (Sentences: Co-referencing) ( (S (NP-SBJ-1 The banks) (VP (ADVP-MNR badly) want (S (NP-SBJ *-1) (VP to (VP break (PP into (NP (NP all aspects) (PP of (NP the securities business))))))))
What is Grammar? • A theory of language • A theory of competence of a native speaker (in the context of a Natural Language) • A finite set of rules • that generates only and all sentences of a language. • that assigns an appropriate structural description to each one. • An explicit model of competence
What are the requirements? • An explicit model of competence • Should be able to generate an infinite set of grammatical sentences of the language • Should not generate any ungrammatical ones • Should be able to account for ambiguities (i.e., If a sentence is understood to have two meanings, the grammar should give two different structural description) • If two sentences are understood to have same meaning, the grammar should give the same structure for both at some level • If two sentences are understood to have different internal relationship, the grammar should assign different structural description
What is Syntax? • Syntax is the study of the combination of words into phrases, clauses and sentences • Syntax describes how sentences and their constituents are structured
Grammatical Analysis Techniques • Two main devices Labeling the Constituents Breaking up a String • Sequential • Hierarchical • Transformational • Morphological • Categorial • Functional • A grammar may combine any of these devices for grammatical analysis.
Breaking up and Labeling • Sequential Breaking up • Sequential Breaking up and Morphological Labeling • Sequential Breaking up and Categorial Labeling • Sequential Breaking up and Functional Labeling • Hierarchical Breaking up • Hierarchical Breaking up and Categorial Labeling • Hierarchical Breaking up and Functional Labeling
Sequential Breaking up • That student solved the problems. student that + + solve + + + + s ed the problem
Sequential Breaking up and Morphological Labeling • That student solved the problems. student that solve s ed the problem word word stem affix affix word stem
Sequential Breaking up and Categorial Labeling • This boy can solve the problem. boy this can solve the problem N Det Aux Det V N • They called her a taxi. They call ed her a taxi Pron V Pron Det N Affix
Sequential Breaking up and Functional Labeling They called her a taxi Subject Verbal Direct Object Indirect Object her a taxi They called Indirect Object Direct Object Subject Verbal
Hierarchical Breaking up • Old men and women Old men and women Old men and women men and women and women Old men Old men and women Old men
Hierarchical Breaking up and Categorial Labeling • Poor John ran away. S NP VP A N V Adv Poor John ran away
Hierarchical Breaking up and Functional Labeling • Immediate Constituent (IC) Analysis • Construction types in terms of the function of the constituents: • Predication (subject + predicate) • Modification (modifier + head) • Complementation (verbal + complement) • Subordination (subordinator + dependent unit) • Coordination (independent unit + coordinator)
Predication • [Birds]subject [fly]predicate S Subject Predicate Birds fly
Modification • [A]modifier [flower]head • John [slept]head [in the room]modifier S Subject Predicate John Head Modifier In the room slept
Complementation • He [saw]verbal [a lake]complement S Subject Predicate He Verbal Complement saw alake
Subordination • John slept [in]subordinator [the room]dependentunit S Subject Predicate John Head Modifier Subordinator Dependent Unit slept in theroom
Coordination • [John came in time] independent unit [but]coordinator [Mary was not ready] independent unit S Independent Unit Independent Unit Coordinator John came in time but Mary was not ready
An Example • In the morning, the sky looked much brighter. S Modifier Head Subject Predicate Subordinator DU Modifier Modifier Head Head Verbal Complement Modifier Head In the morning, the sky looked much brighter
Hierarchical Breaking up and Categorial / Functional Labeling • Hierarchical Breaking up coupled with Categorial /Functional Labeling is a very powerful device. • But there are ambiguities which demand something more powerful. • E.g., Love of God • Someone loves God • God loves someone
Hierarchical Breaking up Categorial Labeling Functional Labeling Love of God Love of God Head Modifier Noun Phrase Prepositional Phrase Sub DU love of God love of God
Types of Generative Grammar • Finite State Model (sequential) • Phrase Structure Model (sequential + hierarchical) + (categorial) • Transformational Model (sequential + hierarchical + transformational) + (categorial + functional)
Finite State Model COMES MAN The machine begins in the initial state, runs through a sequence of states (producing a word with each transition), and ends in the final state (producing a sentence) THE MEN COME OLD MAN COMES THE MEN COME
Phrase Structure Grammar (PSG) A phrase-structure grammar G consists of a four tuple (V, T, S, P), where • V is a finite set of alphabets (or vocabulary) • E.g., N, V, A, Adv, P, NP, VP, AP, AdvP, PP, student, sing, etc. • T is a finite set of terminal symbols: T V • E.g., student, sing,etc. • S is a distinguished non-terminal symbol, also called start symbol: S V • P is a set of production rules
Noun Phrases • John • the student • the intelligent student NP NP NP N N N Det Det AdjP student John student the the intelligent
Noun Phrase • his first five PhD students NP N Quant Det Ord N students five his first PhD
Noun Phrase • The five best students of my class NP PP Quant Det AP N five the best students of my class
Verb Phrases • can sing • can hit the ball VP VP V Aux NP Aux V can sing the ball can hit
Verb Phrase • Can give a flower to Mary VP NP Aux V PP a flower can give to Mary
Verb Phrase • may make John the chairman VP NP Aux V NP John may make thechairman
Verb Phrase • may find the book very interesting VP NP Aux V AP veryinteresting thebook may find
Prepositional Phrases • in the classroom • near the river PP PP NP NP P P in near theclassroom theriver
Adjective Phrases • intelligent • very honest • fond of sweets AP AP AP A PP Degree A A very fond honest ofsweets intelligent
Adjective Phrase • very worried that she might have done badly in the assignment AP Degree A S’ very worried that she might have done badly in the assignment
Phrase Structure Rules • The boy hit the ball. • Rewrite Rules: • S NP VP • NP Det N • VP V NP • Det the • N boy, ball • V hit • We interpret each rule X Yas the instruction rewrite X as Y.
Derivation • The boy hit the ball. • Sentence NP + VP(1) S NP VP Det + N + VP (2) NP Det N Det + N + V + NP(3) VP V NP The + N + V + NP (4) Det the The + boy + V + NP (5) N boy The + boy + hit + NP (6) V hit The + boy + hit + Det + N (2) NP Det N The + boy + hit + the + N (4) Det the The + boy + hit + the + ball(5) N ball
PSG Parse Tree • The boy hit the ball. S NP VP Det N V NP the Det N boy hit the ball
PSG Parse Tree • John wrote those words in the Book of Proverbs. S NP VP PropN PP V NP NP P NP PP John in thosewords wrote ofproverbs thebook
Transformational Grammar If a generative grammar makes use of all the three • Sequential • Hierarchical • transformational breaking up and two • categorial • functional labeling is called a Transformational grammar (Universal Grammar).
Other Grammar Formalisms • Lexical Functional Grammar (LFG) • Generalised Phrase Structure Grammar (GPSG) • Tree Adjoining Grammar (TAG) • Categorial Grammar (CG) • Head-driven Phrase Structure Grammar (HPSG) • Systemic Functional Grammar (SFG)