130 likes | 149 Views
Delve into the world of Probabilistic Context-Free Grammars and learn how probabilities are used for efficient natural language parsing, with insights on parsing approaches and challenges faced. Find out how to derive probabilities from treebanks and create your own for accurate parsing. Discover the benefits of Probabilistic Lexicalised CFGs and Dependency Parsing. Uncover conclusions about practical parsing applications and the computational perspectives of parsers.
E N D
Parsing III Probabilistic Parsing and Conclusions
Probabilistic CFGs • also known as Stochastic Grammars • Date back to Booth (1969) • Have grown in popularity with the growth of Corpus Linguistics
Probabilistic CFGs Essentially same as ordinary CFGS except that each rule has associated with it a probability S NP VP .80 S aux NP VP .15 S VP .05 NP det n .20 NP det adj n .35 NP n .20 NP adj n .15 NP pro .10 • Notice that P for each set of rules sums to 1
Probabilistic CFGs • Probabilities are used to calculate the probability of a given derivation • Defined as the product of the Ps of the rules used in the derivation • Can be used to choose between competing derivations • As the parse progresses (so, can determine which rules to try first) as an efficiency measure • Or at the end, as a way of disambiguating, or expressing confidence in the results
Where do the probabilities come from? • Use a corpus of already parsed sentences: a “treebank” • Best known example is the Penn Treebank • Marcus et al. 1993 • Available from Linguistic Data Consortium • Based on Brown corpus + 1m words of Wall Street Journal + Switchboard corpus • Count all occurrences of each rule variation (e.g. NP) and divide by total number of NP rules • Very laborious, so of course is done automatically
Where do the probabilities come from? • Create your own treebank • Easy if all sentences are unambiguous: just count the (successful) rule applications • When there are ambiguities, rules which contribute to the ambiguity have to be counted separately and weighted
Where do the probabilities come from? • Learn them as you go along • Again, assumes some way of identifying the correct parse in case of ambiguity • Each time a rule is successfully used, its probability is adjusted • You have to start with some estimated probabilities, e.g. all equal • Does need human intervention, otherwise rules become self-fulfilling prophecies
Problems with PCFGs • PCFGs assume that all rules are essentially independent • But, e.g. in English “NP pro” more likely when in subject position • Difficult to incorporate lexical information • Pre-terminal rules can inherit important information from words which help to make choices higher up the parse, e.g. lexical choice can help determine PP attachment
Probabilistic Lexicalised CFGs • One solution is to identify in each rule that one of the elements on the RHS (daughter) is more important: the “head” • This is quite intuitive, e.g. the n in an NP rule, though often controversial (from linguistic point of view) • Head must be a lexical item • Head value is percolated up the parse tree • Added advantage is that PS tree has the feel of a dependency tree
S(shot) S shot man elephant the an VP VP(shot) NP NP(man) NP NP(elephant) n v n v n n det det det det the man shot an elephant the man shot an elephant
Dependency Parsing • Not much different from PSG parsing • Grammar rules still need to be stated as A B c • except that one daughter is identified as the head, e.g. A x h y • As structure is built, the trees are headed by “h” rather than “A” • Can be probabilistic or not
Conclusion 1 • Basic parsing approaches (without constraints) not practical in real applications • Whatever approach taken, bear in mind that the lexicon is the real bottleneck • There’s a real trade-off between coverage and efficiency, so it’s a good idea to sacrifice broad coverage (e.g. domain-specific parsers, controlled language), or use a scheme that minimizes the disadvantages (e.g. probabilistic parsing)
Conclusion 2 • From computational perspective, a parser provides • a formalism for writing linguistic rules • an implementation which can apply the rules to an input text • Also, as necessary • An interface to allow grammar development and testing (eg tracing rules, showing trees) • An interface with the application of which it is a part (may be hidden to the end-user) • All of the above tailored to meet the needs