250 likes | 464 Views
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing. Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun Lang 2011-10-21 I2R SMT-Reading Group. Paper info. Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing
E N D
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghuali Mentor: Jun Lang 2011-10-21 I2R SMT-Reading Group
Paper info Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing ACL-08 Long Paper Cited :Thirty Seven Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea
Core Ideas • VariationalBayes • Tic-tac-toe pruning • Word-to-phrase bootstrapping
Outline • Paper present • Pipeline • Model • Training • Parsing (Pruning) • Result • Shortcomings • Discussion
Summary of the Pipeline • Run IBM Model 1 on sentence-aligned data • Use tic-tac-toe pruning to prune the bitext space • Word-based ITG , VariationalBayes training , get the Viterbi alignment • Non-compositional constraints to constrain the space of phrase pairs • Phrasal ITG , VB training, Viterbi pass to get the phrasal alignment
Review : Inside-Outside Algorithm root Shujieliu i Forward-backward Algorithm: not only used for HMM, but also for any State Space Model 0/0 s/u t/v T/V Inside-Outside Algorithm is a special case of Forward-backward Algorithm. X1 …….. Xn-1 Zn Xn+1 …….. XN …….. …….. ……..
VB Algorithm for Training SITGs - E1 Copy from liu • Inside probabilities : Initialization : Recursion : i(s/u-t/v) j(s/u-S/U) k(S/U-t/v) s/u t/v S/U
VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u
VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u
VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u
VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) k(S/U-s/u) i(s/u-t/v) S/U t/v s/u
VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) j(s/u-t/v) k(S/U-s/u) i(S/U-s/u) i(s/u-t/v) k(s/u-t/v) t/v S/U s/u S/U t/v s/u
VBAlgorithm for Training SITGs - E2 Copy from liu • Outside probabilities : Initialization : Recursion : j(s/u-t/v) j(s/u-t/v) k(S/U-s/u) i(S/U-s/u) i(s/u-t/v) k(s/u-t/v) t/v S/U s/u S/U t/v s/u
VB Algorithm for Training SITGs - M • s=3 , is the number of right-hand-sides for X • m is the number of observed phrase pairs • ψ is the digamma function
Pruning • Tic-tac-toe pruning (HaoZ hang 2005) • Fast Tic-tac-toe pruning (Hao Z hang 2008) • High-precision alignments pruning (HaghighiACL2009) • Prune all bitext cells that would invalidate more than 8 of high-precision alignments • 1-1 alignment posterior pruning (HaghighiACL2009) • Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models
Non-compositional Phrases Constraint e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring
Word Alignment Evaluation • Both 10 iterations training • EM : lowest AER is achieved after the second iteration , which is 0.40 . At iteration 10, AER for EM increase to 0.42 • VB : ac is 1e-9 , VB get AER close to 0.35 at iteration 10.
End-to-end Evaluation NIST Chinese-English training data NIST 2002 evaluation datasets for tuning and evalution 10-reference development set was used for MERT 4-reference test set was used for evaluation.
Shortcomings • Grammar is not perfect • Itgordering is context independent • Phrasal pairs are sparse
Grammar is not perfect • Over-counting problem • alternative ITG parse trees have the same word alignment matching, which is called over-counting problem. ITG Parser Tree Space Word Alignment Space I am rich ! ^^ vv
A better-constrained grammar • A series of nested constituents with the same orientation will always have a left-heavy derivation • And the second parser tree of the former example will not be generated. ? B -> <A B> B -> <C C> A -> [C C] C->1/3 C->2/4 C-> 3/2 C-> 4/1
Thanks Q&A