190 likes | 331 Views
Syntax: Structural Descriptions of Sentences. Why Study Syntax?. Syntax provides systematic rules for forming new sentences in a language. can be used to verify if a sentence is legitimate in a language. a step closer to the “meaning” of a sentence. Who did what to whom semantics
E N D
Why Study Syntax? • Syntax provides • systematic rules for forming new sentences in a language. • can be used to verify if a sentence is legitimate in a language. • a step closer to the “meaning” of a sentence. • Who did what to whom semantics • Applications • Improving precision in search applications • Yankees beat red sox • Red sox beat yankees • Paraphrasing • John loves Mary = Mary is loved by John • Information Extraction • Fill in a form by extracting information from a document.
Structure of Words • What are words? • Orthographic tokens separated by white space. • In some languages the distinction between words and sentences is less clear. • Chinese, Japanese: no white space between words • nowhitespace no white space/no whites pace/now hit esp ace • Turkish: words could represent a complete “sentence” • Eg: uygarlastiramadiklarimizdanmissinizcasina • Morphology: the structure of words • Basic elements: morphemes • Morphological Rules: how to combine morphemes. • Syntax: the structure of sentences • Rules for ordering words in a sentence • Elementary units: Phrasal and Clauses
Morphology and Syntax • Interplay between syntax and morphology • How much information does a language allow to be packed in a word, and how easy is it to unpack. • More information less rigid syntax more free word order • Hindi: “John likes Mary” – all six orders are possible, due to rich morphological information. • John-nom Mary-acc likes • English expresses relations between words through word order. • Morphologically rich languages have freer word order. • However, some parts have rigid word order. • Noun groups in Hindi: “one yellow book”
Outline • Constituency • How does this notion arise? • Type of constituents • Representation: Tree Structure • Formal device: Context Free Grammars • Derived tree and derivation tree • Grammar Equivalence • Strong and weak generative capacity • Chomsky Normal Form • Other Formal Frameworks (Tree-Adjoining Grammar) • Other topics in syntax • Dependency • Spoken language syntax • Structural Priming
Constituency • Words are grouped into part-of-speech groups • Similar morphological inflections • Allows us to create new word forms (“blog”, “xerox”) • Nouns, Verbs, Determiners, Adjectives etc… • Certain sequences of words in a sentence are grouped as constituents • Distributionally similar behavior • cohesive units (move around in a sentence as a unit) • In the morning I take a walk • I take a walk in the morning • Substrings are typed “Clause”, “Noun Phrase”, “Verb Phrase” “Preposition Phrase” etc.
Constituency – contd. • Examples of constituents: • Noun phrase: • the dog, two big light blue vans • Preposition phrase: • in the box, under the bridge • Clause: • the dog bit the man, John thought the dog bit the man • The type of a constituent is derived from the “head word” of the constituent.
Constituent Structure • Decomposition of a sentence into its constituents. • Attaching constituents to each other to reflect relations among words: Emergence of Tree Structure • John saw the man with the telescope • (S (NP John) saw (NP (NP the man) (PP with (NP the telescope)))) • (S (NP John) saw (NP the man) (PP with (NP the telescope)))) • Select a sentence from a newspaper text and provide its constituent structure. • Evidence of another constituent – verb phrase (“VP”) • Substring involving a verb move around and can be referred to as a unit. • VP-fronting (and quickly clean the carpet he did! ) • VP-ellipsis (He cleaned the carpets quickly, and so did she ) • Can have adjuncts before and after VP, but not in VP (He often eats beans, *he eats often beans )
Relations among Words • Types of relations between words • Arguments: subject, object, indirect object, prepositional object • Adjuncts: temporal, locative, causal, manner, … • Function Words • Subcategorization: List of arguments of a word (verb) • with features about realization (POS, perhaps case, verb form etc) • For English, the argument order: Subject-Object-IndirectObj • Example: • like: NP-NP (“John likes Mary”), NP-VP(to-inf) (John likes to watch movies) • think: NP-S (“John thought Mary was going to the party”) • put: NP-NP-PP • Adjuncts are optional (typically modifiers of an action) • John put the book on the table at 3pm yesterday • There are words with “demands” and words that fill the “demands”. • Demands are typed (NP, VP, PP, S)
English Syntax: A Sample • Sentence types: • Declarative (John closed the door) • Imperative (close the door!!) • Yes-No-Question (can you close the door?) • Wh-question (who closed the door? What did John close?) • Clause types: • Infinitival (to read a book) • Gerundive (reading of a book) • Relative Clause (that has a green cover)
English Syntax: A Sample – contd. • Noun Phrase: • Before the head noun: • Pre-determiner Determiner Post-determiner (Adjective|Noun) Noun • After the head noun (Modifiers) • Preposition phrases • Relative Clauses (the book that has only one sentence) • Gerundive (the flight arriving after 10pm) • Auxiliary Verbs • Modal (could, might, will, should…) < perfect (have) < progressive (be) < passive (be) • “might have been destroyed” • Large wide-coverage grammars have been developed/under development • XTAG (www.cis.upenn.edu/~xtag), HPSG, LFG
reads the boy book the a Two Representations of Syntactic Structure • Phrase structure: illustrates the constituents and its type. • Dependency structure: Relations between words without intervening structure. S adj arg0 arg1 NP NP slowly Adv reads fw fw slowly book boy DetP DetP a
Context Free-Grammars • String Rewriting Systems • Transform one string to another (until termination) • G=(V,T,P,S) • where V: vocabulary of non-terminals • T: vocabulary of terminals • S: start symbol • P: set of productions of the form • a b where a V and b (V U T)* • Derivation: Rewrite a non-terminal with the production of the grammar until no non-terminals exist in the string. • Start with “S” • Sample Context-Free Grammar, derivation and derived structure.
Two Representations • String rewriting system: we derive a string (=derived structure) • But derivation history represented by phrase-structure tree (=derivation structure)! • Grammar Equivalence • Can have different grammars that generate same set of strings (weak equivalence) • Can have different grammars that have same set of derivation trees (strong equivalence) • Strong equivalence implies weak equivalence • CFG Normal Forms: • Chomsky Normal Form (a bg) • Griebarch Normal Form (a w b) • Convert a grammar into CNF and GNF
Penn Treebank (PTB) • Syntactically annotated corpus (phrase structure) • Contains 1 miilion words of Wall Street Journal sentences marked up with syntactic structure. • Can be converted into a dependency Treebank. • need for head percolation tables • Completely flat structure in NP • brown bag lunch, pink-and-yellow child seat • Represents a particular linguistic theory • PropBank • PTB with some grammatical relations made explicit
Unification • Mechanism needed to pass and check constraints. • Constraints, syntactic and semantic: • Subject-verb agreement • S NP VP • the boy reads / the boys read / * the boys reads • Subject/Auxiliary inversion: (Yes-no-question) • S AuxVerb NP VP • Do you have flights / * does you have flights • Selectional restrictions: • An apple reads a book • Need a mechanism to encode these constraints • Refine the non-terminal set to encode these constraints. • S 3sgAux 3sgNP VP ; 3sgAux does | has … • S Non3sgAux Non3sgNP VP; Non3sgAux do | have | can • We need to split the NP rule into the 3sgNP and Non3sgNP. • Size of the grammar grows; • can we factor these constraints out of the structure of the rules?
Cat Cat N N boy : boys : Number Number sg pl Person Person 3 3 Cat V reads: Number sg Subj agr Unification – contd. V Cat • Attribute value matrix: read : Number sg Number pl Subj agr Person 1|2 3 Person Check Constraints Percolate Constraints S NP VP VP V NP.number = VP.subj.agr.number NP.person = VP.subj.agr.person VP.number = V.subj.agr.number VP.person = V.subj.agr.person The boy reads / * the boys reads / the boys read
Structural Priming • Structure of preceding sentences helps/hinders the reading times of subsequent sentences. • Dative alternation • The woman gave her car to the church • The woman gave the church her car • One of these forms is primed depending on what the prime was • V NP NP gave the church her car • V NP PP gave her car to the church
Spoken Language Syntax • Not as “clean”, rampant disfluency. • edits (restarts, repairs) • Filled pauses • Ungrammaticality • Sentence utterance. • “Clean up” the utterance first before understanding it.