300 likes | 426 Views
Self-training with Products of Latent Variable Grammars. Zhongqiang Huang, Mary Harper, and Slav Petrov. Overview. Motivation and Prior Related Research Experimental Setup Results Analysis Conclusions. Parse Tree Sentence. Parameters. Derivations.
E N D
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov
Overview • Motivation and Prior Related Research • Experimental Setup • Results • Analysis • Conclusions
Parse Tree Sentence Parameters Derivations ... PCFG-LA Parser[Matsuzaki et. al ’05] [Petrov et. al ’06] [Petrov & Klein’07]
PCFG-LA Parser • Hierarchical splitting (& merging) • Typical learning curve • Grammar Order Selection • Use development set Increased Model Complexity n-th grammar: grammar trained after n-th split-merge rounds
NP VP S Max-Rule Decoding (Single Grammar) [Goodman ’98, Matsuzaki et al. ’05, Petrov & Klein ’07]
Variability [Petrov, ’10]
... Max-Rule Decoding (Multiple Grammars) Treebank [Petrov, ’10]
Product Model Results [Petrov, ’10]
Self-training (ST) Select with dev Hand Labeled Train Train Automatically Labeled Data Unlabeled Data Label
WSJ Self-Training Results F score [Huang & Harper, ’09]
Self-trained Round 6 Self-trained Round 7 Self-Trained Grammar Variability
Summary • Two issues: Variability & Over-fitting • Product model • Makes use of variability • Over-fitting remains in individual grammars • Self-training • Alleviates over-fitting • Variability remains in individual grammars • Next step: combine self-training with product models
Experimental Setup • Two genres: • WSJ: Sections 2-21 for training, 22 for dev, 23 for test, 176.9K sentences per self-trained grammar • Broadcast News: WSJ+80% of BN for training, 10% for dev, 10% for test (see paper), • Training Scenarios: train 10 models with different seeds and combine using Max-Rule Decoding • Regular: treebank training with up to 7 split-merge iterations • Self-Training: three methods with up to 7 split-merge iterations
ST-Reg Multiple Grammars? Select with dev set Hand Labeled Train Train Train ⁞ Product Unlabeled Data Automatically Labeled Data Label Single automatically labeled set by round 6 product
ST-Prod Hand Labeled Train Product ⁞ Use more data? Train ⁞ Product Unlabeled Data Automatically Labeled Data Label Single automatically labeled set by round 6 product
ST-Prod-Mult Hand Labeled Product Product Train ⁞ Label ⁞ Product ⁞ 10 different automatically labeled sets by round 6 product Label
Analysis of Rule Variance • We measure the average empirical variance of the log posterior probabilities of the rules among the learned grammars over a held-out set S to get at the diversity among the grammars:
English Test Set Results (WSJ 23) This Work [Huang & Harper ’08] [Huang ’08] [McClosky et al. ’06] This Work [Charniak & Johnson ’05] [Sagae & Lavie ’06] [Fossum & Knight ’09] [Zhang et al. ’09] [Petrov ’10] [Carreras et al. ’08] Petrov et al. ’06] [Charniak ’00] Single Parser Reranker Product Parser Combination
Conclusions • Very high parse accuracies can be achieved by combining self-training and product models on newswire and broadcast news parsing tasks. • Two important factors: • Accuracy of the model used to parse the unlabeled data • Diversity of the individual grammars