640 likes | 847 Views
Semantic Representation and Formal Transformation. kevin knight usc / isi MURI meeting, CMU, Nov 3-4, 2011. Machine Translation. Phrase-based MT Syntax-based MT Meaning-based MT. source string. target string. NIST 2009 c2e. source string. source tree. target tree. target
E N D
Semantic Representation and Formal Transformation kevin knight usc/isi MURI meeting, CMU, Nov 3-4, 2011
Machine Translation Phrase-based MT Syntax-based MT Meaning-based MT source string target string NIST 2009 c2e source string source tree target tree target string source string meaning representation target string source tree target tree
Meaning-based MT • Too big for this MURI: • What content goes into the meaning representation? • linguistics, annotation • How are meaning representations probabilistically generated, transformed, scored, ranked? • automata theory, efficient algorithms • How can a full MT system be built and tested? • engineering, language modeling, features, training
Meaning-based MT • Too big for this MURI: • What content goes into the meaning representation? • linguistics, annotation • How are meaning representations probabilistically generated, transformed, scored, ranked? • automata theory, efficient algorithms • How can a full MT system be built and tested? • engineering, language modeling, features, training
Meaning-based MT • Too big for this MURI: • What content goes into the meaning representation? • linguistics, annotation • How are meaning representations probabilistically generated, transformed, scored, ranked? • automata theory, efficient algorithms • How can a full MT system be built and tested? • engineering, language modeling, features, training Language-independent theory. But driven by practical desires.
Automata Frameworks • How to represent and manipulate linguistic representations? • Linguistics, NLP, and Automata Theory used to be together (1960s, 70s) • Context-free grammars were invented to model human language • Tree transducers were invented to model transformational grammar • They drifted apart • Renewed connections around MT (this century) • Role: greatly simplify systems!
Finite-State Transducer (FST) Original input: Transformation: k q k FST q k q2 *e* n n q i q AY q g q3 *e* i i q2 n q N q3 h q4 *e* q4 t qfinal T g g k : *e* q2 i : AY h h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*
Finite-State (String) Transducer Original input: Transformation: k FST q k q2 *e* n q2 n q i q AY q g q3 *e* i i q2 n q N q3 h q4 *e* q4 t qfinal T g g k : *e* q2 i : AY h h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*
Finite-State (String) Transducer Original input: Transformation: k FST q k q2 *e* n N q i q AY q g q3 *e* i q i q2 n q N q3 h q4 *e* q4 t qfinal T g g k : *e* q2 i : AY h h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*
Finite-State (String) Transducer Original input: Transformation: k FST q k q2 *e* n N q i q AY q g q3 *e* i AY q2 n q N q3 h q4 *e* q4 t qfinal T g q g k : *e* q2 i : AY h h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*
Finite-State (String) Transducer Original input: Transformation: k FST q k q2 *e* n N q i q AY q g q3 *e* i AY q2 n q N q3 h q4 *e* q4 t qfinal T g k : *e* q2 i : AY h q3 h q n : N qfinal g : *e* q3 q4 t : T t t h : *e*
Finite-State (String) Transducer Original input: Transformation: k FST q k q2 *e* n N q i q AY q g q3 *e* i AY q2 n q N q3 h q4 *e* q4 t qfinal T g k : *e* q2 i : AY h q n : N qfinal g : *e* q3 q4 t : T t q4 t h : *e*
Finite-State (String) Transducer Original input: Transformation: k FST q k q2 *e* n N q i q AY q g q3 *e* i AY q2 n q N q3 h q4 *e* q4 t qfinal T g k : *e* q2 i : AY h q n : N qfinal T g : *e* q3 q4 t : T t h : *e* qfinal
Transliteration Angela Knight transliteration a n ji ra na i to Frequently occurring translation problem for languages with different sound systems and character sets. (Japanese, Chinese, Arabic, Russian, English…) Can’t be solved by dictionary lookup.
Transliteration WFST Angela Knight 7 input symbols 13 output symbols
Transliteration WFSA A Angela Knight WFST B AE N J EH L UH N AY T WFST C a n j i r a n a i t o WFST D
DECODE + millions more WFSA A + millions more WFST B AE N J IH R UH N AY T AH N J IH L UH N AY T OH + millions more WFST C a n j i r a n a i t o WFST D
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S q S NP VP NP VP PRO VBZ NP PRO VBZ NP he enjoys SBAR he enjoys SBAR VBG VP VBG VP listening P NP listening P NP to music to music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: q S 0.2 s x0, wa, r x2, ga, q x1 S q S x0:NP VP NP VP NP VP x1:VBZ x2:NP PRO VBZ NP PRO VBZ NP he enjoys SBAR he enjoys SBAR VBG VP VBG VP listening P NP listening P NP to music to music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP VP PRO VBZ NP r NP q VBZ s NP , , , , ga wa he enjoys SBAR SBAR enjoys PRO VBG VP VBG VP he listening P NP listening P NP to music to music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: 0.7 s NP kare S PRO NP VP he PRO VBZ NP r NP q VBZ s NP , , , , ga wa he enjoys SBAR SBAR enjoys PRO VBG VP VBG VP he listening P NP listening P NP to music to music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP VP PRO VBZ NP r NP q VBZ , , , , kare wa ga he enjoys SBAR SBAR enjoys VBG VP VBG VP listening P NP listening P NP to music to music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970) Original input: Final output: S NP VP PRO VBZ NP , , , , , , , , kare wa ongaku o kiku no ga daisuki desu he enjoys SBAR VBG VP listening P NP to music
Top-Down Tree Transducer Introduced by Rounds (1970) & Thatcher (1970) “Recent developments in the theory of automata have pointed to an extension of the domain of definition of automata from strings to trees … parts of mathematical linguistics can be formalized easily in a tree-automaton setting …” (Rounds 1970, “Mappings on Grammars and Trees”, Math. Systems Theory 4(3)) Large theory literature e.g., Gécseg & Steinby (1984), Comon et al (1997) Once again re-connecting with NLP practice e.g., Knight & Graehl (2005), Galley et al (2004, 2006), May & Knight (2006, 2010), Maletti et al (2009)
Tree Transducers Can be Extracted from Bilingual Data (Galley et al, 04) TREE TRANSDUCER RULES: VBD(felt) 有 VBN(obliged) 责任 VB(do) 尽 NN(part) 一份 NN(part) 一份 力 VP-C(x0:VBN x1:SG-C) x0 x1 VP(TO(to) x0:VP-C) x0 … S(x0:NP-C x1:VP) x0 x1 S NP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-C NPB NPB PRP PRP$ NN i felt obliged to do my part 我 有 责任 尽 一份 力
Syntax-Based Decoding 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
Syntax-Based Decoding “these” “include” “France” “and” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.) . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
Syntax-Based Decoding “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 “these” “include” “France” “and” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.) . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
Syntax-Based Decoding “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 “these” “include” “France” “&” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.) . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
Syntax-Based Decoding “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP) x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 “these” “include” “France” “&” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.) . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
“include astronauts coming from France and Russia” RULE 14: VP(x0:VBP, x1:NP) x0 , x1 “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP) x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 “these” “include” “France” “&” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.) . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
“These 7 people include astronauts coming from France and Russia” RULE 15: S(x0:NP, x1:VP, x2:PUNC) x0 , x1 , x2 RULE 14: VP(x0:VBP, x1:NP) x0 , x1 “include astronauts coming from France and Russia” “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP) x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” “these 7 people” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 RULE 10: NP(x0:DT, CD(7), NNS(people) x0 , 7人 “these” “include” “France” “&” “Russia” “astronauts” “.” RULE 1: DT(these) 这 RULE 2: VBP(include) 中包括 RULE 4: NNP(France) 法国 RULE 5: CC(and) 和 RULE 6: NNP(Russia) 俄罗斯 RULE 8: NP(NNS(astronauts)) 宇航 , 员 RULE 9: PUNC(.) . 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
Derived English Tree S VP NP VP PP NP NP NP NP NP NNS VBG NNP CC NNP PUNC DT CD VBP NNS IN . These 7 people include astronauts coming from France and Russia
Machine Translation Phrase-based MT Syntax-based MT Meaning-based MT source string target string source string source tree target tree target string source string meaning representation target string source tree target tree
Five Equivalent Meaning Representation Formats “The boy wants to go.” LOGICAL FORM PATH EQUATIONS ((x0 instance) = WANT ((x1 instance) = BOY ((x2 instance) = GO ((x0 agent) = x1 ((x0 patent) = x2 ((x2 agent) = x1 E w, b, g : instance(w, WANT) ^ instance(g, GO) ^ instance(b, BOY) ^ agent(w, b) ^ patient(w, g) ^ agent(g, b) PENMAN (w / WANT :agent (b / BOY) :patient (g / GO :agent b))) DIRECTED ACYCLIC GRAPH FEATURE STRUCTURE patient instance instance: WANT agent instance agent: instance: BOY WANT 1 agent patient: instance: GO agent: instance GO 1 BOY
Example “Government forces closed on rebel outposts on Thursday, showering the western mountain city of Zintan with missiles and attacking insurgents holed up near the Tunisian border, according to rebel sources.” (s / say :agent (s2 / source :mod (r / rebel)) :patient (a / and :op1 (c / close-on :agent (f / force :mod (g / government)) :patient (o / outpost :mod (r2 / rebel)) :temporal-locating (t / thursday)) :op2 (s / shower :agent f :patient (c / city :mod (m / mountain) :mod (w / west) :name "Zintan") :instrument (m2 / missile)) :op3 (a / attack :agent f :patient (i / insurgent :agent-of (h / hole-up :pp-near (b / border :gpi (c2 / country :name "Tunisia"] Slogan: “more logical than a parse tree”
Automata Frameworks • Hyperedge-replacement graph grammars • (Drewes et al) • DAG acceptors • (Hart 75) • DAG-to-tree transducers • (Kamimura & Slutski 82)
Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants to see WANT agent instance SEE BOY
Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants to be seen WANT patient instance SEE BOY
Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants the girl to be seen WANT patient instance SEE instance BOY GIRL
Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants to see the girl agent WANT patient instance SEE instance BOY GIRL
Mapping Between Meaning and Text foreign text patient instance agent instance the boy wants to see himself agent WANT patient instance SEE BOY
Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) patient instance agent recipient instance PROMISE agent instance GO GIRL instance BOY
Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) instance patient recipient PROMISE instance instance agent GO GIRL instance BOY
Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) instance patient recipient PROMISE instance instance agent GO q.NN | girl instance BOY
Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) instance patient recipient PROMISE instance instance agent GO q.NN | girl instance q.NN | boy
Bottom-Up DAG-to-Tree Transduction(Kamimura & Slutski 82) instance patient recipient PROMISE instance instance agent q.VBZ | goes q.NN | girl instance q.NN | boy