330 likes | 506 Views
Adaptor Grammars. Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University. Outline. Definition and motivation behind u nsupervised g rammar learning Non-parametric Bayesian statistics Adaptor grammars vs. PCFG
E N D
Adaptor Grammars EhsanKhoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University
Outline • Definition and motivation behind unsupervised grammar learning • Non-parametric Bayesian statistics • Adaptor grammars vs. PCFG • A short introduction to Chinese Restaurant Process • Applications of Adaptor grammar
Unsupervised Learning • How many categories of objects? • How many features does an object have? • How many words and rules are in a language?
Grammar Induction Goal: • study how a grammar and parses can be learnt from terminal strings alone Motivation: • Help us understand human language acquisition • Inducing parsers for low-resource languages
Nonparametric Bayesian statistics • Learning the things people learn requires using rich, unbounded hypothesis spaces • Language learning is non-parametric inference, no (obvious) bound on number of words, grammatical, morphemes. • Use stochastic processes to define priors on infinite hypothesis spaces
Nonparametric Bayesian statistics • Likelihood: how well grammar describes data • Prior: Encode our knowledge or expectation of grammars before seeing the data • Universal Grammar (very specific) • Shorter Grammars (general constraints) • Posterior: Shows uncertainty of learner about which grammar is correct (distribution over grammars) Posterior Likelihood Prior
Is PCFG good enough for our purpose? • PCFG can be learnt through Bayesian framework but … • Set of rules is fixed in standard PCFG estimation • PCFG rules are “too small” to be effective units of generalization How can we solve this problem?
Two Non-parametric Bayesian extensions to PCFGs • let the set of non-terminals grow unboundedly: • Start with un-lexicalized short grammar • Split-Join of non-terminals • let the set of rules grow unboundedly: • Generate new rules when ever you need • Learn sub-trees and their probabilities ( Bigger units of generalization)
Adaptive Grammar • CFG rules is used to generate the trees as in a CFG • We have two types of non-terminals: • Un-adapted (normal) non-terminals • Picking a rule and recursive expanding its children as in PCFG • Adapted non-terminals • Picking a rule and recursive expanding its children • Generating a previously generated tree (proportional to number of times that it is already generated) We have a Chinese Restaurant Process for each adapted non-terminal
The Story of Adaptor Grammars • In PCFG, rules are applied independently from each other. • The sequence of trees generated by an adaptor grammar are not independent. • if an adapted sub-tree has been used frequently in the past, it's more likely to be used again. • An un-adapted nonterminal expands Using with probability proportional to • An adapted nonterminal expands: • to a sub-tree rooted in with probability proportional to the number of times was previously generated • Using with probability proportional to • is prior.
Properties of Adaptor grammars • In Adaptor grammars: • The probability of adapted sub-trees are learnt separately, not just product of probability of rules. • “Rich get richer” (Zipf distribution) • Useful compound structures are more probable than their parts. • there is no recursion amongst adapted non-terminals (an adapted non-terminal never expands to itself)
The Chinese Restaurant Process • n customers walk into a restaurant, choose tables zi with probability • Defines an exchangeable distribution over seating arrangements (inc. counts on tables)
Application of Adaptor grammars No usage for parsing! Because grammar induction is hard. • Word Segmentation • Learning concatenative morphology • Learning the structure of NE NPs • Topic Modeling
Unsupervised Word Segmentation • Input: phoneme sequences with sentence boundaries • Task: identify words
Performance • Evaluated on Brent corpus
Morphology • Input: raw text • Task: identify stems and morphemes and decompose a word to its morphological components • Adaptor grammars can just be applied for simple concatenative morphology.
Adaptor grammar for morphological analysis Generated Words: cats dogs cats
Performance • For more sophisticated model: • 116,129 tokens: 70% correctly segmented • 7,170 verb types:66% correctly segmented
Inference • distribution of adapted trees are exchangeable : Gibbs sampling • Variational Inference method is also provided for learning adaptor grammars. Covering this part is beyond the objectives of this talk.
Conclusion • We are interested in inducing grammars without supervision for two reasons: • Language acquisition • Low-resource languages • PCFG rules are too much small for bigger generalization • Learning the things people learn requires using rich, unbounded hypothesis spaces • Adaptor grammars using CRP to learn rules from this unbounded hypothesis spaces
References • Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models, M. Johnson et al. , ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2007 • Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure, Mark Johnson, ACL-08, HLT , 2008 • Inferring Structure from Data, Tom Griffith, Machine Learning summer school, Sardinia, 2010