60 likes | 296 Views
Inter-set SMT - MOSES og POS. Lene Offersgaard, Claus Povlsen Center for Sprogteknologi SDMT-SMV2 workshop 25. september 2007. Translation Workflow. English text. SMT : Statistical resources. Preprocessing. Translation Engine. Language model srilm 3. MOSES Decoder. Phrase table.
E N D
Center for Sprogteknologi Inter-set SMT - MOSES og POS Lene Offersgaard, Claus Povlsen Center for Sprogteknologi SDMT-SMV2 workshop 25. september 2007
Center for Sprogteknologi Translation Workflow English text • SMT: • Statistical • resources Preprocessing Translation Engine Language model srilm 3 MOSES Decoder Phrase table Postprocessing Danish text Proff reading
Center for Sprogteknologi Adding linguistic information to SMT: MOSES • MOSES • Open source system replacing Pharaoh (Koehn et al. 2007) • State-of-the-art phrase-based approach • Using factored translation models • Comparison Pharao and Moses decoder • Reuse of statistical resources possible
Center for Sprogteknologi Adding linguistic information using MOSES • Using factored translation models • Makes it possible to build translation models based on surface forms, part-of-speech, morphology etc. • We use: • Translation model: word->word, pos->pos • Generation model determine the output Input Output word word pos+morf pos+morf
Center for Sprogteknologi Results adding pos-tags – by inspection • With inclusion of morpho-syntactic information: • (lit:… control of the full spectrum) • ... kontrol af det fulde spektrum (gender agreement) • (lit: … the active ingredients) • ... de aktive bestanddele (number agreement) • (lit: ... this constant erosion) • ... denne konstante erosion (definiteness agreement)
Center for Sprogteknologi Adding linguistic information using MOSES • Using factored translation models • Makes it possible to build translation models based on surface forms, part-of-speech, morphology etc. • We use: • Translation model: word->word, pos->pos • Generation model determine the output Input Output word word pos+morf pos+morf