540 likes | 719 Views
ELABORAZIONE DEL LINGUAGGIO NATURALE. SINTASSI: GRAMMATICHE CONTEXT-FREE. Syntax. Syntax: from Greek syntaxis “setting out together, arrangement’’ Refers to the way words are arranged together, and the relationship between them. Distinction: Prescriptive grammar: how people ought to talk
E N D
ELABORAZIONE DEL LINGUAGGIO NATURALE SINTASSI: GRAMMATICHE CONTEXT-FREE
Syntax • Syntax: from Greek syntaxis “setting out together, arrangement’’ • Refers to the way words are arranged together, and the relationship between them. • Distinction: • Prescriptive grammar: how people ought to talk • Descriptive grammar: how they do talk • Goal of syntax is to model the knowledge of that people unconsciously have about the grammar of their native language
key ideas of syntax • Constituency • Subcategorization • Grammatical relations Plus one part we won’t have time for: • Movement/long-distance dependency
Context-Free Grammars (CFG) • Capture constituency and ordering • Ordering: • What are the rules that govern the ordering of words and bigger units in the language? • Constituency: How words group into units and how the various kinds of units behave
Constituency • E.g., Noun phrases (NPs) • Three parties from Brooklyn • A high-class spot such as Mindy’s • The Broadway coppers • They • Harry the Horse • The reason he comes into the Hot Box • How do we know these form a constituent?
Constituency (II) • They can all appear before a verb: • Three parties from Brooklyn arrive… • A high-class spot such as Mindy’s attracts… • The Broadway coppers love… • They sit • But individual words can’t always appear before verbs: • *from arrive… • *as attracts… • *the is • *spot is… • Must be able to state generalizations like: • Noun phrases occur before verbs
Constituency (III) • Preposing and postposing: • On September 17th, I’d like to fly from Atlanta to Denver • I’d like to fly on September 17th from Atlanta to Denver • I’d like to fly from Atlanta to Denver on September 17th. • But not: • *On September, I’d like to fly 17th from Atlanta to Denver • *On I’d like to fly September 17th from Atlanta to Denver
S NP VP NP Nom Pro Verb Det Noun Noun I prefer a morning flight Indicating constituents: brackets, trees • [S [NP [PRO I]] [VP [V prefer] [NP [Det a] [Nom [N morning] [N flight] ] ] ] ]
Beyond regular languages: Context-Free Grammars S NP VPNP Det NominalNominal NounVP V Det theDet aNoun flightV left NLE
CFGs: set of rules • S -> NP VP • This says that there are units called S, NP, and VP in this language • That an S consists of an NP followed immediately by a VP • Doesn’t say that that’s the only kind of S • Nor does it say that this is the only place that NPs and VPs occur
Generativity • As with FSAs you can view these rules as either analysis or synthesis machines • Generate strings in the language • Reject strings not in the language • Impose structures (trees) on strings in the language • How can we define grammatical vs. ungrammatical sentences?
Derivations • A derivation is a sequence of rules applied to a string that accounts for that string • Covers all the elements in the string • Covers only the elements in the string
S NP VP NP Nom Pro Verb Det Noun Noun I prefer a morning flight Derivations as Trees
CFGs more formally • A context-free grammar has 4 parameters (“is a 4-tuple”) • A set of non-terminal symbols (“variables”) N • A set of terminal symbols (disjoint from N) • A set of productions P, each of the form • A -> • Where A is a non-terminal and is a string of symbols from the infinite set of strings ( N)* • A designated start symbol S
Defining a CF language via derivation • A string A derives a string B if • A can be rewritten as B via some series of rule applications • More formally: • If A -> is a production of P • and are any strings in the set ( N)* • Then we say that • A directly derives or A • Derivation is a generalization of direct derivation • Let 1, 2, … m be strings in ( N)*, m>= 1, s.t. • 12, 23… m-1m • We say that 1derives m or 1*m • We then formally define language LG generated by grammar G • A set of strings composed of terminal symbols derived from S • LG = {w | w is in * and S * w}
Derivations and languages • The language LG GENERATED by a CFG grammar G is the set of strings of TERMINAL symbols that can be derived from the start symbol S using the production rules in G • LG = {w | w is in * and S derives w} • The strings in LG are called GRAMMATICAL • The strings not in LG are called UNGRAMMATICAL NLE
Grammar development • One of the most basic skills in NLE is the ability to write a CFG for some fragment of a language (e.g., the dates) • We’ll briefly cover some of the issues to be addressed when writing small CFG grammars NLE
Basic types of phrases • Sentences • Noun Phrases • Verb phrases • Prepositional phrases NLE
NPs • NP -> Pronoun • I came, you saw it, they conquered • NP -> Proper-Noun • Los Angeles is west of Texas • John Hennessy is the president of Stanford • NP -> Det Noun • The president • NP -> Nominal • Nominal -> Noun Noun • A morning flight to Denver
Noun phases: premodifiers • NP (Det) (Card) (Ord) (Quant) (AP) Nominal • Det: Determiners • a flight • Optional: I’m looking for flights to Denver • Card: Cardinal numbers (one stop) • Ord: Ordinal numbers (the first flight) • Quantifiers: most flights to Denver leave in the morning • AP (Adjectives): three very expensive seats NLE
Noun phases: postmodifiers • Nominal Noun • Nominal Nominal PP (PP) (PP) • Nominal Nominal GerundVP • Nominal Nominal RelClause NLE
PPs • PP -> Preposition NP • From LA • To the store • On Tuesday morning • With lunch
Recursion • Nominal Nominal PP (PP) (PP) • Is an example of RECURSIVE rule • Other examples: • NP NP PP • VP VP PP • Recursion a powerful device, but could have bad consequences (see lectures on parsing) NLE
Recursion [[Flights] [from Denver]] [[[Flights] [from Denver]] [to Miami]] [[[[Flights] [from Denver]] [to Miami]] [in February]] [[[[[Flights] [from Denver]] [to Miami]] [in February]] [on a Friday]] Etc. NP -> NP PP
Coordination • NP NP and NP • John and Mary left • VP VP and VP • John talks softly and carries a big stick • S S and / but / S • Kim is a lawyer but Sandy is reading medicine. • In fact, probably English has a • XP XP and XP rule NLE
Implications of recursion and context-freeness • If you have a rule like • VP -> V NP • It only cares that the thing after the verb is an NP It doesn’t have to know about the internal affairs of that NP
The point • VP -> V NP • (I) hate flights from Denver flights from Denver to Miami flights from Denver to Miami in February flights from Denver to Miami in February on a Friday flights from Denver to Miami in February on a Friday under $300 flights from Denver to Miami in February on a Friday under $300 with lunch
Problems • Agreement • Subcategorization • Movement (for want of a better term)
This dog Those dogs This dog eats Those dogs eat *This dogs *Those dog *This dog eat *Those dogs eats Agreement
S -> NP VP NP -> Det Nominal VP -> V NP … SgS -> SgNP SgVP PlS -> PlNp PlVP SgNP -> SgDet SgNom PlNP -> PlDet PlNom PlVP -> PlV NP SgVP ->SgV Np … Possible CFG Solution
CFG Solution for Agreement • It works and stays within the power of CFGs • But it’s ugly • And it doesn’t scale all that well
Subcategorization • Sneeze: John sneezed • *John sneezed the book • Say: You said [United has a flight]S • Prefer: I prefer [to leave earlier]TO-VP • *I prefer United has a flight • Give: Give [me]NP[a cheaper fare]NP • Help: Can you help [me]NP[with a flight]PP • *Give with a flight
Subcategorization • Subcat expresses the constraints that a predicate (verb for now) places on the number and syntactic types of arguments it wants to take (occur with).
So? • So the various rules for VPs overgenerate • They permit the presence of strings containing verbs and arguments that don’t go together • For example: • VP -> V NP • therefore Sneezed the book is a VP since “sneeze” is a verb and “the book” is a valid NP
Possible CFG Solution • VP -> V • VP -> V NP • VP -> V NP PP • … • VP -> IntransV • VP -> TransV NP • VP -> TransVwPP NP PP • …
Forward Pointer • It turns out that verb subcategorization facts will provide a key element for semantic analysis (determining who did what to who in an event).
Movement • Core example • My travel agent booked the flight • [[My travel agent]NP [booked [the flight]NP]VP]S • i.e. “book” is a straightforward transitive verb. It expects a single NP arg within the VP as an argument, and a single NP arg as the subject.
Movement • What about? • Which flight do you want me to have the travel agent book? • The direct object argument to “book” isn’t appearing in the right place. It is in fact a long way from where its supposed to appear. • And note that it’s separated from its verb by 2 other verbs.
CFGs: a summary • CFGs appear to be just about what we need to account for a lot of basic syntactic structure in English. • But there are problems • That can be dealt with adequately, although not elegantly, by staying within the CFG framework. • There are simpler, more elegant, solutions that take us out of the CFG framework (beyond its formal power). Syntactic theories: HPSG, LFG, CCG, Minimalism, etc.
Other syntactic stuff • Grammatical relations • Subject • I booked a flight to New York • The flight was booked by my agent • Object • I booked a flight to New York • Complement • I said that I wanted to leave
Dependency parsing • Word to word links instead of constituency • Based on the European rather than American traditions • But dates back to the Greeks • The original notions of Subject, Object and the progenitor of subcategorization (called ‘valence’) came out of Dependency theory. • Dependency parsing is quite popular as a computational model since relationships between words are quite useful
S submitted VP NP agent nsubjpass auxpass VP VBD PP NP VBN PP Brownback Bills were IN NP prep_on nn NP IN NNS NN CC NNS ports NNP NNP Senator conj_and Bills on ports and immigration were submitted by Senator Brownback immigration Dependency parsing Parse tree: Nesting of multi-word constituents Typed dep parse: Grammatical relations between individual words
Why are dependency parses useful? • Example: multi-document summarization Need to identify sentences from different documents that each say roughly the same thing: phrase structure trees of paraphrasing sentences which differ in word order can be significantly different but dependency representations will be very similar