410 likes | 571 Views
Syntax. Construction of phrases and sentences from morphemes and words. Usually the word syntax refers to the way words are arranged together. Syntactic structure and rules that determine syntactic structure.
E N D
Syntax • Construction of phrases and sentences from morphemes and words. Usually the word syntax refers to the way words are arranged together. • Syntactic structure and rules that determine syntactic structure. • There are various different models for computationally modeling syntactic structure. Most of them are based on Context Free Grammar, a formalism powerful enough to model many phenomena occurring in natural language, and yet computationally tractable
Syntax • Three important notions related to syntax: • Constituency refers to groups of words behaving as one single unit, called constituent. • Grammatical relations refer to notions about the role of words in a sentence and the relations between such roles. E.g. Notions about the subject and the object of a sentence. • Subcategorization refers to the relations between words and phrases and the syntactical preferences of words. E.g. The verb want can be followed by an infinitive, but not the verb find. I want to fly to Detroit * I found to fly to Detroit
Constituency • How do words group together? • Noun phrases: • three parties from Brooklyn • a high-class spot such as Mindy’s • they • the reason he comes into the Hot Box • Certain linguistic evidence lead us to believe that these words group together (form a constituent).
Constituency • Words belonging to similar group appear in similar syntactic environments. E.g. noun phrases can be followed by a verb. three parties from Brooklyn arrive... a high-class spot such as Mindy’s attracts... but * from arrive... * as attracts... • Often such structures cannot be broken inside a sentence. E.g. On September 17th, I’d like to fly from Atlanta to Denver. I’d like to fly on September 17th from Atlanta to Denver. but, * On September, I’d like to fly 17th from Atlanta to Denver.
Context-Free Grammars • Context-free grammars (CFG) (or Phrase-Structure Grammars) are a formalism for modeling constituent structure. • A CFG consists of a set of rules (or productions), each of which expresses the ways that symbols of a language can be grouped and ordered together, and a lexicon of words and symbols. • The symbols that correspond to words in the language are called terminal symbols, while the symbols that express generalization of these are called non-terminal.
Context-Free Grammars • E.g. NP -> Det Nominal (1) NP -> ProperNoun (2) Nominal -> Noun | Noun Nominal (3) Det -> a (4) Det -> the (5) Noun -> flight (6) • Terminals: a, the, flight • Non-terminals: NP, Det, Nominal, ProperNoun, Noun
Derivations • A CFG is a device for generating sentences and a device for assigning structure to a given sentence. An arrow -> can be thought as meaning “rewrite the symbol on the left with the string of symbols on the right”. Such rewrites are also called derivations e.g. NP (1)-> Det Nominal (3)-> Det Noun (4),(6) -> a flight • We say that a flight can be derived from the symbol NP. This can be also represented as a tree.
Example Parse Tree [S[NP [PRO I] ] [VP [V prefer] [NP [Det a] [Nom [N morning] [N flight]]]]]
Context Free Grammars • A CFG defines a formal language. All sentences that can be derived by the CFG, starting from a set non-terminal symbol (start symbol) belong to the language and are called grammatical. Sentences that cannot be derived are called ungrammatical. • The problem of mapping from a string of words to its parse tree is called parsing.
Agreement • [NP What flights] leave in the morning. • [NP What flight] leaves in the morning. • * [NP What flight] leave in the morning. • How can a CFG grammar handle this agreement phenomenon? One solution is to expand the grammar with multiple sets of rules, one rule for each case. E.g. S -> Aux NP VP is broken into S -> 3sgAux 3sgNP VP and S -> Non3sgAux Non3sgNP VP where 3sgAux -> does | has | can | ... Non3sgAux -> do | have | can | ... In a similar way NP must be broken into 3sgNP and Non3sgNP
Agreement • This method for dealing with agreement doubles the size of the grammar. In many other languages the problem is far more complicated. E.g. in Greek there is gender agreement and case agreement between. • A more elegant way to deal with the problem of agreement is through Unification Grammars that allow the parameterization of non-terminal symbols of the grammar with feature structures.
Verb Phrase and Subcategorization • A Verb phrase consists of a verb and a number of other constituents. VP -> Verb disappear VP -> Verb NP prefer a morning flight VP -> Verb NP PP leave Boston in the morning • Or a verb may be followed by more complicated complements You [VP [S said you had a 266 dollar fare]] [VP [V Tell] [NP me] [S how to get from the airport to downtown]] I [V want [VP to arrange three flights]] • But not every verb is compatible with every possible complement. I want to fly to Detroit * I found to fly to Detroit
Verb Phrase and Subcategorization • We say that verbs subcategorize for different complements. Traditional grammars distinguish between transitive and intransitive verbs. Modern grammars distinguish up to 100 different categories. The possible sets of complements are called subcategorization frames.
Spoken Language Syntax • Several differences between spoken and written language syntax. Usually in spoken language the term utterance is used instead of the term sentence. • In speech we deal with: • Instead of punctuation we have pauses. • Non-verbal events: [uh], [mm], [clear throat]. • Disfluencies.
Finite State vs. Context Free Grammars • Why do we need to resort to CFG to model constituency in syntax? Are the finite-state models we used for morphology inadequate? • The problem is recursion. • Generally, it is not possible to fully model syntax using FSAs, but it is often possible to approximate the behavior of CFGs with FSA (e.g. by restricting the depth of the recursion)
Grammars and Human Processing • Do people actually use CFGs in their mental processing of language? We are not certain. • Early studies showed that when people heard an interruption (e.g. a click) in the middle of a constituent they often misinterpreted it as occurring in a constituent boundary. But this might have been because the constituent also formed a semantic unit.
Grammars and Human Processing • Other studies showed that when humans were presented with a certain constituent structure. e.g. IBM moved [NP a bigger computer] [PP to the Sears store] it made it more likely that they use a similar structure like: The wealthy widow gave [NP her Mercedes] [PP to the church] instead of: The wealthy widow gave [NP the church][NP her Mercedes] • Some researchers claim that natural language syntax can be described by formal languages and is separated from semantic or pragmatic information (modularist position). • Others claim that it is impossible to model syntactic knowledge without including additional knowledge (e.g. semantic, intonational, pragmatic, social,interactional).
Parsing • Syntactic Parsing is the task of recognizing an input sentence and assigning some syntactic structure to it. CFGs are just a declarative formalism. In order to compute how a parse tree will be assigned to a sentence we require a parsing algorithm. • Applications of parse trees: Word processing (grammar checkers), semantic analysis, machine translation, question answering, information extraction, speech recognition, ...
Parsing as Search • Syntactic parsing can be seen as a search through all possible parse trees to find the correct parse for the sentence. The search space is defined by the grammar.
Parsing as Search The correct parse tree for the sentence: Book that flight
Parsing as Search • The goal of the search is to find all trees whose root is the start symbol S, and which cover exactly all the words in the input. There are two kinds of constraints. One that comes from the data and one that comes from the grammar. • When the search is based on the grammar constraints, we have a top-down or goal-directed search. • When the search is based on the data constraints, we have a bottom-up or data-directed search.
Top-Down Parsing • A top-down parser tries to build a parse tree by building from the root node S down to the leaves.
Bottom-Up Parsing A bottom-up parser starts with the input and tries to build a tree rooted in the start symbol S, which covers all the input.
Top-Down vs. Bottom-Up Parsing • Top-down does not waste time exploring trees that cannot result in an S, or subtrees that cannot exist in an S rooted tree. Bottom-up generates large number of trees that have no chance of ever leading to an S. • But top-down also wastes considerable time on examining S trees that are not consistent with the input, since it starts generating trees without examining the input. Bottom-up parsers never suggest trees that are not (at least locally) consistent with the input. • Each approach fails to take advantage of all the constraints of the problem. The best results are given by parsers that incorporate features from both top-down and bottom-up parsers
A Basic Top-Down Parser • When building a parser we make decisions about the search. Such decisions affect the search strategy, the choice of which node of the tree to expand and the order in which the grammar rules are to be applied. We can build a simple top-down parser based on a depth first search strategy, by expanding the left-most node and by applying grammar rules based on the order in which they appear in the grammar. • Such an algorithm contains an agenda of search-states. Each state consists of partially parsed tree along with a pointer to the next input word in the sentence. The search is performed by taking a state from the agenda and producing a new set of states by applying the possible grammar rules.
Bottom-Up Filtering • The top-down parser along the left-edge of the tree until it gets to the bottom-left of the tree. If the parse is successful the current input word must be the first word in the derivation from the node that the parser is currently processing. This leads to the idea of bottom-up filtering. • The parser should not consider a grammar rule if the current input word cannot serve as the first word along the left edge of some derivation of the rule. e.g. S -> NP VP S -> Aux NP VP S -> VP If the input word it Does (Aux), the only rule that can lead to an Aux is the rule S -> Aux NP VP. Therefore the parser doesn’t need to examine the other two rules.
Left-Recursion • Depth-first search often leads to infinite loops when exploring infinite spaces. This occurs in top-down, depth-first parsing when the grammar is left-recursive. A grammar is left-recursive if it contains a non-terminal symbol that has a derivation that includes itself anywhere along its leftmost branch. e.g. NP -> Det Nominal Det -> NP ’ s • Left recursive rules are rules of the form A-> A b NP->NP PP S -> S and S
Left-Recursion • Solutions: • Rewrite the grammar, eliminating left recursion. This is theoretically possible, but the new grammar may not be intuitive or natural in describing syntactic structures. • Restrict the depth of the search.
Ambiguity • Structural ambiguity occurs when a grammar assigns more than one possible parse trees to a sentence. There are various different types of structural ambiguity. • Attachment ambiguity is when a particular constituent can be attached to the parse tree in more that one ways. E.g • I shot an elephant in my pajamas. • We saw the Eiffel Tower flying to Paris. • Coordination ambiguity is when there are different sets of phrases that can be joined by a conjunction such as and. • [old [men and women]] or [old men] and [women] • Noun phrase bracketing ambiguity. • [Dead [poets’ society]] or [[Dead poets’] society]
Ambiguity • Choosing the correct parse of a sentence among the possible parses is a task that requires additional semantic and statistical information. A parser without such information should return all possible parses. • However often a sentence may lead to a huge number of parses. Sentences with many PP attachments like Show me the meal on Flight UA 386 from San Francisco to Denver. lead to an exponational number on parses.
Repeated Parsing of Subtrees • The parser often builds valid trees for a portion of the input and then discards them during backtracking because they fail to cover all of the input. Later, the parser has to rebuild the same trees again in the search. • In the table is shown how many times each constituent of the example sentence “A flight from Indianapolis to Houston on TWA” is built. A flight 4 from Indianapolis 3 Houston 2 on TWA 1 A flight from Indianapolis 3 A flight from Indianapolis to Houston 2 A flight from Indianapolis to Houston on TWA 1
The Earley Parser • The Earley parser deals successfully with the aforementioned problems. Early parser is based on the dynamic programming paradigm, according to which a problem is solved by solving sub-problems of the problem and then combining the to solve the whole problem. • The core of the Early algorithm is a chart of N+1 entries (N is the length of the input). For each word position the chart contains a list of states representing the partial parse trees generated so far. Each state contains a grammar rule corresponding to a subtree, information about the progress in completing the subtree, and the position of the subtree with respect to the input.
The Earley Parser • By keeping the partial parses in the chart, the Early parser doesn’t have to rebuild the trees during backtracking, so there is no unnecessary repeated parsing of subtrees. • Additionally, all the possible parses of the sentence are implicitly stored in the chart in polynomial time O(N3). • Of course if the number of parses is exponential, the algorithm will need exponential time to return them all.
S8: Verb -> book S14: Det -> that S18: Noun ->flight S21: NP -> Det NOMINAL S22: VP -> Verb NP S23: S -> VP
Finite-State Parsing • Often an application doesn’t require a full parse, but a partial parse or shallow parse is sufficient. In such cases instead of using a CFG systems use cascades of finite-state automata. Such FSA grammars instead of returning a full parse of a sentence can be used to detect noun groups, or verb groups etc... • In cases when such systems require recursion (e.g. the definition of NPs may require other NPs for relative clauses) then recursion is limited by using cascades of FSA. One level finds NPs without recursion, the next level combines them into NPs with one level of recursion and so on.ecursive Transition Networks