280 likes | 359 Views
CLUKI XII: April 24, 2009. Using Percolated Dependencies in PBSMT. Ankit K. Srivastava and Andy Way Dublin City University. About. Syntactic Parsing and Head Percolation. Parsing I: Constituency Structure. Vinken will join the board as a nonexecutive director Nov 29 (ROOT (S
E N D
CLUKI XII: April 24, 2009 Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University
Parsing I: Constituency Structure Vinken will join the board as a nonexecutive director Nov 29 (ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))
Parsing II: Dependency Structure Vinken will join the board as a nonexecutive director Nov 29 HEAD DEPENDENT join Vinken join will board the join board join as director a director nonexecutive as director 29 Nov join 29
Parsing III: Head Percolation • It is straightforward to convert constituency tree to an unlabeled dependency tree (Gaifman 1965) • Use head percolation tables to identify head child in a constituency representation (Magerman 1995) • Dependency tree is obtained by recursively applying head child and non-head child heuristics (Xia & Palmer 2001) (NP (DT the) (NN board)) NP right NN/NNP/CD/JJ (NP-board (DT the) (NN board)) the is dependent on board
Parsing IV: Three Parses • Constituency (phrase-structure) parses : CONrequires CON parser • Dependency (head-dependent) parses : DEPrequires DEP parser • Percolated (head-dependent) parses : PERCrequires CON parser + heuristics
PBSMT I: Framework • argmaxe p(e|f) = argmaxep(f|e) p(e) • Decoder, Translation Model, Language Model • PBSMT framework in Moses (Koehn et al., 2007) • Phrase Table in Translation Model := Align words + extract phrases + score phrases • Different methods to extract phrases • Moses phrase extraction as baseline system…
PBSMT II: Non-syntactic Phrase Extraction • … baseline Moses • Get word alignments (src2tgt, tgt2src) • Perform grow-diag-final heuristics (Koehn et al., 2003) • Extract phrase pairs consistent with the word alignments • String-based (non-syntactic) phrases: STR
PBSMT III: Syntactic Phrase Extraction • Get word alignments (src2tgt, tgt2src) • Parse src sentences • Parse tgt sentences • Use Tree Aligner to align subtree nodes (Zhechev 2009) • Extract surface-level chunks from parallel treebanks • Previously, Tinsley et al., 2007 & Hearne et al., 2008 • Syntactic phrases: CON DEP PERC
System I: Tools and Resources • English-French parallel corpora • Phrase Structure Parsers (En, Fr) • Dependency Structure Parsers (En, Fr) • Head Percolation tables (En, Fr) • Statistical Tree Aligner • Giza++ Word Aligner • SRILM (Language Modeling) Toolkit • Moses Decoder
PERC is a unique knowledge source… System II: # Entries in Phrase tables: Europarl … but is it useful?
System III: Combinations • Concatenate phrase tables and re-estimate probabilities • 15 different systems: ∑4Cr , 1≤r≤4 STR CON DEP PERC
Numbers III: Uniquely best • Evaluate MT systems STR, CON, DEP, PERC on a per sentence level. (Translation Error Rate) • JOC (440 sentences): • Europarl (2000 sentences):
Analysis I: STR • Using Moses baseline phrases (STR) is essential for coverage. SIZE matters! • However, adding any system to STR increases baseline score. Symbiotic! • Hence, do not replace STR, but augment it.
Analysis II: CON • Seems to be the best combination with STR (S+C seems to be the best performing system) • Has most common chunks with PERC • Does PERC harm a CON system – needs more analysis
Analysis III: DEP • PERC is different from DEP chunks, despite being formally equivalent • PERC can substitute DEP
Analysis IV: PERC • Is a unique knowledge source. • Sometimes, it helps. • Needs more work on finding connection with CON / DEP
Conclusion & Future Work • Extended Hearne et al., 2008 by- scaling up data size from 7.7K to 100K- introducing percolated dependencies in PBSMT • Manual evaluation • More analysis of results • More combining strategies • Seek to determine if each chunk type “owns” sentence types
Thanks <asrivastava @ computing.dcu.ie>