Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghuali Mentor: Jun Lang 2011-10-21 I2R SMT-Reading Group

Paper info Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing ACL-08 Long Paper Cited :Thirty Seven Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea

Core Ideas • VariationalBayes • Tic-tac-toe pruning • Word-to-phrase bootstrapping

Outline • Paper present • Pipeline • Model • Training • Parsing (Pruning) • Result • Shortcomings • Discussion

Summary of the Pipeline • Run IBM Model 1 on sentence-aligned data • Use tic-tac-toe pruning to prune the bitext space • Word-based ITG , VariationalBayes training , get the Viterbi alignment • Non-compositional constraints to constrain the space of phrase pairs • Phrasal ITG , VB training, Viterbi pass to get the phrasal alignment

Phrasal Inversion Transduction Grammar

Dirichlet Prior for Phrasal ITG

Review : Inside-Outside Algorithm root Shujieliu i Forward-backward Algorithm: not only used for HMM, but also for any State Space Model 0/0 s/u t/v T/V Inside-Outside Algorithm is a special case of Forward-backward Algorithm. X1 …….. Xn-1 Zn Xn+1 …….. XN …….. …….. ……..

VB Algorithm for Training SITGs - E1 Copy from liu • Inside probabilities : Initialization : Recursion : i(s/u-t/v) j(s/u-S/U) k(S/U-t/v) s/u t/v S/U

VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u

VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) j(s/u-t/v) k(S/U-s/u) i(S/U-s/u) i(s/u-t/v) k(s/u-t/v) t/v S/U s/u S/U t/v s/u

VB Algorithm for Training SITGs - M • s=3 , is the number of right-hand-sides for X • m is the number of observed phrase pairs • ψ is the digamma function

Pruning • Tic-tac-toe pruning (HaoZ hang 2005) • Fast Tic-tac-toe pruning (Hao Z hang 2008) • High-precision alignments pruning (HaghighiACL2009) • Prune all bitext cells that would invalidate more than 8 of high-precision alignments • 1-1 alignment posterior pruning (HaghighiACL2009) • Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models

Tic-tac-toe pruning (Hao Z hang 2005)

Non-compositional Phrases Constraint e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring

Word Alignment Evaluation • Both 10 iterations training • EM : lowest AER is achieved after the second iteration , which is 0.40 . At iteration 10, AER for EM increase to 0.42 • VB : ac is 1e-9 , VB get AER close to 0.35 at iteration 10.

End-to-end Evaluation NIST Chinese-English training data NIST 2002 evaluation datasets for tuning and evalution 10-reference development set was used for MERT 4-reference test set was used for evaluation.

Shortcomings • Grammar is not perfect • Itgordering is context independent • Phrasal pairs are sparse

Grammar is not perfect • Over-counting problem • alternative ITG parse trees have the same word alignment matching, which is called over-counting problem. ITG Parser Tree Space Word Alignment Space I am rich ! ^^ vv

A better-constrained grammar • A series of nested constituents with the same orientation will always have a left-heavy derivation • And the second parser tree of the former example will not be generated. ? B -> <A B> B -> <C C> A -> [C C] C->1/3 C->2/4 C-> 3/2 C-> 4/1

Thanks Q&A

Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Presentation Transcript

Bayesian Learning

Bayesian Learning

Bayesian Learning

Learning with Bayesian Networks

Bayesian Learning

Bayesian Learning and Learning Bayesian Networks

Learning with Bayesian Networks

Bayesian Learning

Bayesian Learning

Bayesian Learning

Learning With Bayesian Networks

Learning with Bayesian Networks

Bayesian and non-Bayesian Learning in Games

Effect of synchronous vs. non-synchronous recordings

Bayesian Learning

Bayesian Learning

Bayesian Learning

Learning with Bayesian Networks

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian learning