210 likes | 402 Views
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-. 발표자 : 정영임 발표일 : 2007. 10. 6. Table of Contents. 12.1 Probabilistic Context-Free Grammars 12.2 Problems with PCFGs 12.3 Probabilistic Lexicalized CFGs. Introduction. Goal
E N D
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자: 정영임 발표일: 2007. 10. 6.
Table of Contents • 12.1 Probabilistic Context-Free Grammars • 12.2 Problems with PCFGs • 12.3 Probabilistic Lexicalized CFGs
Introduction • Goal • To build probabilistic models of sophisticated syntactic information • To use this probabilistic information in efficient probabilistic parser • Use of Probabilistic Parser • To disambiguation • Earley algorithm can represent the ambiguities of sentences, but it can not resolve them • Probabilistic grammar can choose the most-probable interpretation • To make language model • For speech recognizer, N-gram model was used in predicting upcoming words, and helping constrain the search for words • Probabilistic version of more sophisticated grammar can provide additional predictive power to speech recognition
12.1 Probabilistic Context-Free Grammars • Probabilistic Context-Free Grammars(PCFG) • PCFG is also known as the Stochastic Context Free Grammar(SCFG) • Five parameters of PCFG 5-tuple G=(N, ∑, P, S, D) A set of non-terminal symbols (or “variables”) N A set of terminal symbols ∑ (disjoint from N) A set of productions P, each of the form A→β, where A is a non-terminal and β is a string of symbols from the infinite set of strings (∑∪N)* 4. A designated start symbol S 5. A function assigning probabilities to each rule in P P(A →β) or P(A→β|A)
12.1 Probabilistic Context-Free Grammars • Sample PCFG for a miniature grammar
12.1 Probabilistic Context-Free Grammars • Probability of a particular parse T • Production of the probabilities of all the rules r used to expand each node n in the parse tree • By the definition of conditional probability • Since a parse tree includes all the words of sentence, P(S|T) is 1
12.1 Probabilistic Context-Free Grammars Higher probability
12.1 Probabilistic Context-Free Grammars • Formalization of selecting the parse with highest probability • The best tree for a sentence S out of the set of parse trees for S(which we’ll call τ(S)) • Since P(S) is constant for each tree, we can eliminate it → → Since P(T,S) = P(T)
12.1 Probabilistic Context-Free Grammars • Probability of an ambiguous sentence • Sum of probabilities of all the parse trees for the sentence
Other issue on PCFG • Prefix • Jelinek and Lafferty (1991) gives an Algorithm for efficiently computing the probability of a prefix of sentence • Stolcke (1995) describes how the standard Earley parser can be augmented to compute these prefix probabilities • Jurafsky et al (1995) describes an application of a version of this algorithm as the language model for a speech recognizer • Consistent • PCFG is said to be consistent if the sum of the probabilities of all sentences in the language equals 1 • Certain kinds of recursive rules cause a grammar to be inconsistent by causing infinitely looping derivations for some sentences • Booth and Thompson (1973) gives more details on consistent and inconsistent grammars
Probabilistic CYK Parsing of PCFGs • Parsing problem for PCFG • Can be interpretated into how to compute the most-likely parse for a given sentence • Algorithms for computing the most-likely parse • Augmented Earley algorithm (Stolcke, 1995) • Probabilistic Earley algorithm is somewhat complex to present • Probabilistic CYK(Cocke-Younger-Kasami) algorithm • CYK algorithm is worth understanding
Probabilistic CYK Parsing of PCFGs • Probabilistic CYK(Cocke-Younger-Kasami) algorithm • CYK algorithm is essentially a bottom-up parser using dynamic programming table • Bottom-up makes it more efficient when processing lexicalized grammar • Probabilistic CYK parsing was first described by Ney(1991) • CYK parsing algorithm presented here • Collins(1999), Aho and Ullman(1972)
Probabilistic CYK Parsing of PCFGs • Input, output, and data structure of Probabilistic CYK
Probabilistic CYK Parsing of PCFGs • Pseudocode for Probabilistic CYK algorithm
Learning PCFG Probabilities • Where do PCFG probabilities come from • Obtaining the PCFG probabilities from Tree bank • Tree bank: a corpus of already-parsed sentences • E.g.) Penn Tree bank(Marcus et al., 1993) • Brown Corpus, Wall street Journal, parts of Switchboard corpus • Probability of each expansion of a non-terminal • By counting the number of times that expansion occurs • Normalizing
Learning PCFG Probabilities • Where do PCFG probabilities come from • Learning the PCFG probabilities • PCFG probabilities can be generated by first parsing a (raw) corpus • Unambiguous sentences • Parse the corpus • Increment a counter for every rule in the parse • Then normalize to get probabilities • Ambiguous sentences • We need to keep a separate count for each parse of a sentence and weight each partial count by the probability of the parse it appears in • Standard algorithm for computing this is called Inside-Outside algorithm proposed by Baker(1979) • Cf.) Manning and Schuze(1999) for a complete description