390 likes | 410 Views
Thai-English MT Project: Transfer Module. Prachya Boonkwan IPA: /pratʃ ə ˑjaː bunˑkʰʷan/ NECTEC, Thailand. Outline. Introduction Analysis Module Transfer Module Generation Module Conclusion. Thai Analysis. Thai Sentence. English Generation. English Sentence. Introduction.
E N D
Thai-English MT Project:Transfer Module Prachya BoonkwanIPA: /pratʃəˑjaː bunˑkʰʷan/NECTEC, Thailand
Outline • Introduction • Analysis Module • Transfer Module • Generation Module • Conclusion CERDEC, NJ
ThaiAnalysis ThaiSentence EnglishGeneration EnglishSentence Introduction • Thai-English Machine Translation Thai EnglishTransfer CERDEC, NJ
Introduction (cont’d) • Characteristics of Thai • Analytic language • Subject-Verb-Object pattern • Words written consecutively without space • Serial verb construction • No articles and no mass/concrete classification • Use of classifiers • Auxiliary words to express number, voice, tense, and aspect CERDEC, NJ
Introduction (cont’d) • Issues of Thai-English translations • Summarized from Monthika’s observation • Different orderings between Thai and English • Different verb arguments between Thai and English • Implicit relations in Thai serial noun construction • Semantic duplication in Thai serial verb construction • No plural inflection in Thai • No inflection to express voices, tenses, and aspects in Thai CERDEC, NJ
Introduction (cont’d) • Issue 1: Different orderings between Thai and English CERDEC, NJ
Introduction (cont’d) • Issue 2: Different verb arguments between Thai and English chãnPRpaiVfràngsèsNLit: IPR goV FranceNTrans: IPR goVto FranceN phaanrowngN1yùtV1sùubV2bùrìiN2Lit: janitorN1 stopV1 smokeV2 cigaretteN2Trans: The janitorN1 stoppedV1smokingV2 cigaretteN2 CERDEC, NJ
Introduction (cont’d) • Issue 3: Implicit relations in Thai serial noun construction CERDEC, NJ
Introduction (cont’d) • Issue 4: Semantic duplication in Thai serial verb construction raayngaanV1 hâiPsâabV2Lit: reportV1 toP knowV2Trans: reportV1(to knowV2) khâaV1 hâiPtaayV2Lit:killV1 toP dieV2Trans:killV1(to dieV2) phûutV1 hâiPfangV2Lit:tellV1 toP listenV2Trans:tellV1(to listenV2) CERDEC, NJ
Introduction (cont’d) • Issue 5: No plural inflection in Thai CERDEC, NJ
Introduction (cont’d) • Issue 6: No inflection to express voices, tenses, and aspects in Thai CERDEC, NJ
ThaiGrammar w1w2w3 Thai WordSegmentor ThaiParser ThaiSentence ThaiDep. Tree List of wordsand POSes Analysis Module • Overview of Analysis Module Analysis Module CERDEC, NJ
< < > < < < < ส่งsendVT ทหารsoldierN อิรักIraqN ในinP แบลร์BlairN ไปtoP ปี 20032003N อังกฤษBritainN Analysis Module (cont’d) • Thai Parser: input and output CERDEC, NJ
MappingTables w1w2w3 Thai-EngTransformation Leaf-NodeCollection ThaiDep. Tree List of lemmasannotated withsyntacticattributes EnglishDep. Tree Transfer Module • Overview of Transfer Module Transfer Module CERDEC, NJ
Transfer Module (cont’d) • Attributes of Thai nouns CERDEC, NJ
Transfer Module (cont’d) • Attributes of Thai verb CERDEC, NJ
Transfer Module (cont’d) • Transfer operations CERDEC, NJ
Transfer Module (cont’d) • Reordering (R) • Relocates constituents resulting quasi-English dependency tree • Attribute assignment (A) • Assigns English’s syntactic attributes to quasi-English tree • Insertion (I) & Deletion (D) • Inserts/deletes constituents to quasi-English dependency tree resulting English tree CERDEC, NJ
Transfer Module (cont’d) • Transfer operations classified into groups CERDEC, NJ
Attributeassignment Reordering Insertion& Deletion ThaiDep. Tree Quasi-EnglishDep. Tree EnglishDep. Tree Transfer Module (cont’d) • Steps of transfer operations CERDEC, NJ
Transfer Module (cont’d) • Graphical notations: tree pattern < < > W2 only one depth any depth W3 > W1 W2 W4 W1 W1 < *W2 (*W3 > (*W4 > *W1)) < *W2 CERDEC, NJ
Transfer Module (cont’d) • Graphical notation: transfer operation OPERATIONR < > > ADV N > N V ADV V (N > V) < *ADV --> N > (*ADV > V) {R} CERDEC, NJ
Transfer Module (cont’d) • Demonstration > < < < > > < < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) N < ADJ--> ADJ > N {R} > < < > > > < < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) N < NUM--> NUM > N {R} > < > > > < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) N < DET--> DET > N {R} > > > > > < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) NUM > N--> NUM > N[+plu] {A} > > > > > < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) DET > N[+plu]--> DET[+plu] > N[+plu]{A} > > > > > < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) V < PR--> V < PR[+acc] {A} > > > > > < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) PROG > V--> PROG > V[+prog]{A} > > > > > < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) bark < PR -->bark < (at < PR){I} > > > > > < < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) NUM < CL --> NUM {D} > > > > > < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) CL > ADJ --> ADJ {D} > > > > > < < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) PROG > V[+prog]--> V[+prog] {D} > > > < > < CERDEC, NJ
Transfer Module (cont’d) • Demonstration (cont’d) > These three big dogs are barking at me. > > < > < Left-NodeCollection CERDEC, NJ
Lemma MappingTables DiscourseStack DB w1w2w3 E1E2E3 W’1W’2W’3 Article Insertion List of lemmasannotated withsyntacticattributes Surface Word List English Output Sentence Surface Word Generation Generation Module • Overview of generation module Generation Module CERDEC, NJ
ThaiAnalysis ThaiSentence ThaiDep. Tree EnglishGeneration EnglishSentence w1w2w3 List of lemmasannotated withsyntactic attributes Conclusion • Thai-English Machine Translation Thai EnglishTransfer CERDEC, NJ
Conclusion (cont’d) • Issues of Thai-English translation • Attributes of Thai lexical units • Generalized transfer operations • Reordering • Attribute assignment • Insertion • Deletion CERDEC, NJ
Upcoming Work • Prologizing transfer operations • Generation Module CERDEC, NJ