150 likes | 292 Views
pre-ordering dependency subtreeS for phrase-based smt. Intern: Arianna Bisazza . Mentors: Alex Ceausu, John Tinsley. Dependency subtree pre-ordering. What if … we can’t/don’t want to change the decoding process we have dependency parses available. …one way to go:
E N D
pre-ordering dependency subtreeS for phrase-based smt Intern: Arianna Bisazza. Mentors: Alex Ceausu, John Tinsley
Dependency subtree pre-ordering • What if… • we can’t/don’t want to change the decoding process • we have dependency parses available • …one way to go: • pre-order input parse trees, then translate normally • Main research problems: • how to pre-order? (ordering model) • and what? (rule selection)
Dependency subtree pre-ordering “Die Budapester Staat anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet.” the Budapest Prosecutor’s Office has its investigation on the accident initiated • Permute subtrees (a node + its children) • Each subtree processed independently die|ART NK Budapester|NN NK VAFIN VAFIN VVPP VVPP Staat|NN NK VVPP VVPP $. $. NN NN NN NN anwaltschaft|NN SB hat|VAFIN ihre|PPOSAT NK OC Ermittlungen|NN MNR zum|APPRART OA PUNC NK Vorfall|NN ... ... eingeleitet|VVPP .|$.
Dependency subtree pre-ordering “Die Budapester Staat anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet.” the Budapest Prosecutor’s Office has its investigation on the accident initiated • Permute subtrees (a node + its children) • Each subtree processed independently die|ART NK Budapester|NN NK VAFIN VAFIN VVPP VVPP Staat|NN NK VVPP VVPP $. $. NN NN NN NN anwaltschaft|NN SB hat|VAFIN eingeleitet|VVPP OC ihre|PPOSAT NK OA Ermittlungen|NN PUNC MNR zum|APPRART ... ... NK Vorfall|NN .|$.
Pre-ordering model (1) – MLE • Baseline model: max likelihood MLE (relative frequency-based) • Subtree representation: relation type and POS tag _OC|VVPP _OC|VVPP _OC|VVPP Prob=0.75 OA|NN OA|NN OA|NN Prob=0.25 • Limitations: • - ambiguitydue to coarse word classification (only few • relation/POS tags) • - coverage: many unseen or low-counts subtrees
Pre-ordering model (2) – SMT • Idea: learn to reorder by SMT! • Train a phrase-based system on pairs of original/pre-ordered source language node sequences (subtrees) ORIGINAL SB|NN _ROOT|VAFIN OC|VVPP PUNC|$. NK|ART NK|NN NK|NN _SB|NN OA|NN _OC|VVPP ... OA|NN _OC|VVPP ... PRE-ORDERED SB|NN _ROOT|VAFIN OC|VVPP PUNC|$. NK|ART NK|NN NK|NN _SB|NN _OC|VVPP OA|NN ... OA|NN _OC|VVPP ... • Advantages: • generalization: all node sequences can be processed • model flexibility: represent different features as “factors” • tune different model weights by MERT
Pre-ordering model (2) – SMT • Each feature type is represented as a factor, for example: ORIGINAL SB|NN|anwaltschaft _ROOT|VAFIN|hat OC|VVPP|eingeleitet PUNC|$.|. NK|ART|die NK|NN|Budapester NK|NN|Staat _SB|NN|anwaltschaft OA|NN|Ermittlungen _OC|VVPP|eingeleitet ... • Possible models: • original-to-preordered phrase table • “target” (preordered) n-gram language models • lexicalized reordering models at the level of relation type, POS tags or words etc. • all models log-linearly combined • weights tuned by MERT, optimizing reo.score (KRS)
Evaluation Training/dev/test: 495/2.5/2.5K sent. from WMT-12 De-En train data 1.6M/8K/9K training subtrees (rooted at verb nodes) 4.8M/23K/24K training subtrees (all with >1 node)
Selective pre-ordering • Not all subtrees need to be pre-ordered • (especially in language pairs like German-English) • How to select them? • Approach: compute average distortion gain on training data, then only pre-order subtrees with high distortion gain • Pre-ordering performances, with two different thresholds
MT experiments Using WMT-12 De-En training and test data
MT output examples (1) ORI: nach dem steilen Abfall am Morgen konnte die Prager Börse die Verluste korrigieren . REO: nach dem steilen Abfall am Morgen die Prager Börse konnte die Verluste korrigieren . REF: after a sharp drop in the morning , the Prague Stock Market corrected its losses . BASE: after the sharp falls on the morning , the Prague Stock Exchange to correct the losses . NEW: after the sharp falls on the morning the Prague Stock Exchange was able to correct the losses .
MT output examples (2) ORI: … über einen Plan , der funktionieren wird und der auf dem Markt auch wirksam sein muss . REO: … über einen Plan , der wird funktionieren und der muss sein auch wirksam auf dem Markt . REF: … on a plan which will function and which also must be effective on the market . BASE: … on a plan that will work and on the market also needs to be effective . NEW: … on a plan that will work and must also be effective on the market .
MT output examples (3) ORI: die Kongress Abgeordneten müssen nämlich noch einige Details der Vereinbarung aushandeln , ehe sie die Endfassung des Gesetzes veröffentlichen und darüber abstimmen dürfen . REO: die Kongress Abgeordneten müssen nämlich aushandeln , ehe sie veröffentlichen die Endfassung des Gesetzes und dürfen darüber abstimmen noch einige Details der Vereinbarung . REF: that is , the members of congress have to complete some details of the agreement before they can make the final version of the law public and vote on it . BASE: members of Congress : some details must still negotiate the agreement before they publish the final version of the law and able to vote on it . NEW: members of Congress must negotiate before they publish the final version of the law and must still vote on some details of the agreement .
Conclusions & TODO’s • Pre-ordering with SMT-like system always outperforms baseline MLE, but gains are small • Evaluation issue: reference reorderings are very noisy! • When input is pre-ordered BLEU improves but KRS decreases... more error analysis needed! • Possible reason: the SMT system must be re-trained (or at least tuned) on pre-ordered data • More thresholds for rule selection should be tested • … other suggestions?