Parsing III

Parsing III Probabilistic Parsing and Conclusions

Probabilistic CFGs • also known as Stochastic Grammars • Date back to Booth (1969) • Have grown in popularity with the growth of Corpus Linguistics

Probabilistic CFGs Essentially same as ordinary CFGS except that each rule has associated with it a probability S  NP VP .80 S  aux NP VP .15 S  VP .05 NP  det n .20 NP  det adj n .35 NP  n .20 NP  adj n .15 NP  pro .10 • Notice that P for each set of rules sums to 1

Probabilistic CFGs • Probabilities are used to calculate the probability of a given derivation • Defined as the product of the Ps of the rules used in the derivation • Can be used to choose between competing derivations • As the parse progresses (so, can determine which rules to try first) as an efficiency measure • Or at the end, as a way of disambiguating, or expressing confidence in the results

Where do the probabilities come from? • Use a corpus of already parsed sentences: a “treebank” • Best known example is the Penn Treebank • Marcus et al. 1993 • Available from Linguistic Data Consortium • Based on Brown corpus + 1m words of Wall Street Journal + Switchboard corpus • Count all occurrences of each rule variation (e.g. NP) and divide by total number of NP rules • Very laborious, so of course is done automatically

Where do the probabilities come from? • Create your own treebank • Easy if all sentences are unambiguous: just count the (successful) rule applications • When there are ambiguities, rules which contribute to the ambiguity have to be counted separately and weighted

Where do the probabilities come from? • Learn them as you go along • Again, assumes some way of identifying the correct parse in case of ambiguity • Each time a rule is successfully used, its probability is adjusted • You have to start with some estimated probabilities, e.g. all equal • Does need human intervention, otherwise rules become self-fulfilling prophecies

Problems with PCFGs • PCFGs assume that all rules are essentially independent • But, e.g. in English “NP  pro” more likely when in subject position • Difficult to incorporate lexical information • Pre-terminal rules can inherit important information from words which help to make choices higher up the parse, e.g. lexical choice can help determine PP attachment

Probabilistic Lexicalised CFGs • One solution is to identify in each rule that one of the elements on the RHS (daughter) is more important: the “head” • This is quite intuitive, e.g. the n in an NP rule, though often controversial (from linguistic point of view) • Head must be a lexical item • Head value is percolated up the parse tree • Added advantage is that PS tree has the feel of a dependency tree

S(shot) S shot man elephant the an VP VP(shot) NP NP(man) NP NP(elephant) n v n v n n det det det det the man shot an elephant the man shot an elephant

Dependency Parsing • Not much different from PSG parsing • Grammar rules still need to be stated as A B c • except that one daughter is identified as the head, e.g. A  x h y • As structure is built, the trees are headed by “h” rather than “A” • Can be probabilistic or not

Conclusion 1 • Basic parsing approaches (without constraints) not practical in real applications • Whatever approach taken, bear in mind that the lexicon is the real bottleneck • There’s a real trade-off between coverage and efficiency, so it’s a good idea to sacrifice broad coverage (e.g. domain-specific parsers, controlled language), or use a scheme that minimizes the disadvantages (e.g. probabilistic parsing)

Conclusion 2 • From computational perspective, a parser provides • a formalism for writing linguistic rules • an implementation which can apply the rules to an input text • Also, as necessary • An interface to allow grammar development and testing (eg tracing rules, showing trees) • An interface with the application of which it is a part (may be hidden to the end-user) • All of the above tailored to meet the needs

Parsing III

Parsing III

Presentation Transcript

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing III (Top-down parsing: recursive descent & LL(1) )

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing III

Parsing III

Presentation Transcript

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing III (Top-down parsing: recursive descent &amp; LL(1) )

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing III (Top-down parsing: recursive descent & LL(1) )