230 likes | 407 Views
Soft Syntactic Constraints for Hierarchical Phrase-Based Translation. Yuval Marton and Philip Resnik
E N D
Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguisticsand the Laboratory for Computational Linguistics and Information Processing (CLIP)at the Institute for Advanced Computer Studies (UMIACS)University of Maryland, College Park, MD 20742-7505, USA {ymarton, resnik} @t umiacs.umd.edu ACL’08, Columbus, Ohio, June 2008
Why Has Source-side Syntax Not Helped SMT as Much as Target-side Syntax? • Most previous work: Syntactic representations data-driven patterns • Chiang 05, ours: Data-driven patterns syntactic constraints • Why the failure in latter direction? • Noisy / inaccurate parsing info? • Too coarse a usage of syntax info? • We argue the latter: rule granularity and constraint conditionsare key • We show that adding (soft) syntactic constraints to data-driven patternsyields substantial improvements.
Outline • Background • Hiero • Soft Syntactic Constraints • Adding Syntax • Rule Granularity • Constraint Conditions • Experiments • Conclusions + Future Work
Univ. Hard Univ. Soft Knowledge and Constraints • Syntactic-tree-based vs. Data-driven • Formal vs. linguistic syntax (Chiang 2005) • Formal Syntax (e.g., Synchronous CFG) • Linguistic Syntax (parses) • Hard vs. Soft Constraints • Hard constraint: limit possible space (only allow rules compatible with constraint) • Soft constraint: skew space towards constraint (but clear patterns in data ‘win’ even if incompatible with constraint) • Soft syntactic constraint: boost weight of data-driven rules that are compatible with parsing info.
Hiero • Chiang 2005, 2007 • Weighted synchronous CFG • Unnamed non-terminals: X <e, f >e.g., X <今年X1, X1 this year> • Translation model features:e.g., log p(e|f) • Log-linear model: + rule penalty feature, “glue” rules 的竞选Election 投票 在初选voted inthe primaries
Soft Syntactic Constraints • Chiang’s 2005 constituency feature • Boost rule’s score if rule’s source-side matches a constituent span • Constituency-incompatible emergent patterns can still ‘win’ (in spite of no boost) • Good idea -- Neg-result • But what if…
Rule granularity • Chiang: Single weight for all constituents (parse tags) • … But what if we can assign a separate feature and weight for each constituent? • E.g., NP-only: (NP= ) • Or VP-only: (VP= )
Constraint Conditions • VP-only, revisited: • We saw VP-match (VP= ):exact match of a VP sub-tree span • We can also incur a cost for crossingconstituent boundaries: e.g., VP-cross (VP+ )
Feature Space • {NP, VP, IP, CP, …} x {match=,cross-boundary+} • Basic translation models: • For each feature, add (only it) to default feature set, assigning it a separate weight. • Feature “combo” translation models: • NP2 (double feature): add both NP+ and NP= with separate weights each • NP_ (conflated feature) ties weights of NP+, NP= • XP=, XP+, XP2, XP_:conflate all labels that correspond to “standard” X-bar Theory XP constituents in each condition. • All-labels= (Chiang’s), All-labels+, All-labels_,All-labels2
Settings • Hiero Default feature set for baseline • Chinese baseline also included a specialized number translation feature (Chiang 2007) • LM: SRI Language Modeling Toolkit (Stolcke, 2002) with modified Kneser-Ney smoothing (Chen & Goodman, 1998). • Word-level alignments: GIZA++ (Och & Ney, 2000). • Source-side parses: • Chinese: Huang et al. (2008) • Arabic: Stanford Parser v.2007-08-19 (Klein & Manning 2003) • Optimized using MERT (Och 2003) • with BLEU (Papineni et al. 2002) • and the NIST-implemented “shortest” effective ref. length. • Dev set: Chinese NIST MT03; Arabic NIST MT02.
Chinese-English • Replicated Chiang 2005 constituency feature (negative result) • NP=, QP+, VP+ up to .74 BLEU points better. • XP+, IP2, all-labels_, VP2, NP_, up to 1.65 BLEU points better. • Validated on the NIST MT08 test set *,**: sig. better than baseline+,++: better than Chiang-05
Arabic-English • New result for Chiang’s constituency feature (MT06, MT08) • PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline. • AP2, AdvP2 up to 1.94 better. • Validated on the NIST MT08 test set *,**: sig. better than baseline+,++: better than Chiang-05
Discussion • Direct contribution • Feature better translates related phrases (not shown here) • Indirect contribution • Translation of other parts can be (and is) influenced(to appoint a representative of syria to the united nationsvs. to appoint syria to the united nations representative) • Feature combinations do not always help • In fact, some combos do worse than each feature alone. • Within-language consistency across test sets • Chinese: NP, VP, IP, (XP, all-labels) • Arabic: PP, AP,AdvP, (IP, VP, XP) • Across-language variation, but IP & VP do well.
Conclusion: Our Approach Data-driven approach (Stat MT) using Formal syntax (SCFG) while adding Soft constraints (weights) of linguistic syntax (parses) with fine-grained constituent features (NP, VP, …) and constraint conditions (match=,cross+)
Main Contributions • First time to achieve improvement using (soft) syntax info in Hiero • Previous (Chiang 2005) negative result – not (or not only) due to noisy parses • Finer syntactic rule resolutionhelps (NP, VP,…) • Finer (soft) constraint conditions help (NP=, NP+, VP=, VP+, …) • Selective application: parse labels that are not ”standard” XP constituent labels seem to be more noisy than helpful • Feature combos do not always help (might do worse) • Inter-language variation, but IP and VP generally do well cross-linguistically. • Within-language consistency (across test sets)
Future Work • Why do feature combos’ contributions sometimes cancel each other out? • We found no simple correlation between finer-grained feature scores (and/or boundary condition) and combination or conflation scores. • Why did no NP variant yield much gain in Arabic? • Exploit other forms of soft constraints
Thanks • This work was supported in part by DARPA prime agreement HR0011-06-2-0001. • Thanks to David Chiang and Adam Lopez for making their source code available; • Thanks to the Stanford Parser team and Mary Harper for making their parsers available; • Thanks to David Chiang, Amy Weinberg, and CLIP Laboratory colleagues, particularly Adam Lopez, Chris Dyer, and Smaranda Muresan, for discussion and invaluable assistance.
Hiero Default Feature Set and the “Standard” XP Label Set • Hiero default feature set: • LM, p(e|f), p(f|e), plex(e|f), plex(f|e), rule (phrase) penalty and glue rule feature weights. • Chinese-only: number translation feature • “Standard” linguistic labels: {CP, IP, NP, VP, PP, ADJP, ADVP, QP, LCP, DNP} • Excluding non-maximal projection labels such as VV, NNP, etc. • excluding labels such as PRN (parentheses), FRAG (fragment), etc. • XP= : disjunction of {CP=, IP=, …, DNP=}
PP+ Example: Arabic MT06 Note that this example might be misleading in not being a good representative example of the feature’s contribution.