180 likes | 333 Views
Evaluation of the Statistical Machine Translation Service for Croatian-English. Marija Brkić Department of Informatics, University of Rijeka mbrkic@uniri.hr Tomislav Vičić Freelance teacher of economics and translator ssimonsays@gmail.com Sanja Seljan Department of Information Sciences
E N D
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka mbrkic@uniri.hr Tomislav Vičić Freelance teacher of economics and translator ssimonsays@gmail.com Sanja Seljan Department of Information Sciences Faculty of Humanities and Social Sciences, University of Zagreb sanja.seljan@ffzg.hr
Machine translation Syntactic transfer Example-based translation Statistically-based translation Evaluation Manual Automatic Experimental study Google Translate Service (Croatian → English) Comparison and analysis Manual evaluation Conclusion Outline
I.Machine translation basics Speeding uptranslation process Limited human component Multilingual access to written material Limited capabilities Help in discovering general idea behind text For limited use only
Approaches Word-for-word Syntactic transfer* Interlingua Controlled language Example based* Statistically based* Various combinations
Syntactic transfer Involves some linguistic rules Analyzes sourcesand translates using intermediary linguistic representations Usage still limited for particular purposes (i.e. scientific, marketing, etc.) Examples: Systran, Eurotra, Metal
Example-based Uses blocks of words (example sentences) Utilizes analogy principle Needs to be fed with info System “learns” during augmenting stage Suitable for structurallycompletely different languages Example: translation memories
Statistically-based (a.k.a. SMT) Utilizes statistical models Parameters derived from bilingual corpora Phrases as n-grams (n is number of terms in a phrase) Requires vast quantities of matched bilingual texts Outputsmost likely match inputs Does not applylinguistic rules Attempts to match language patterns
Problems Modeling / Learning / Decoding Approaches Word-based / Phrase-based / Syntax-based Example Google Translate Service Statistically based (cont.)
Manual Humanbilingual or monolingual evaluators score outputs according to fluency (grammar) and adequacy (preservation of information) Time-consuming, expensive and very subjective Automatic(BLEU, METEOR, etc.) Reference translations Goalhigher degree of correlation with human judgements II. Evaluation
III. Experimental study Croatian – English “Very odd couple” A lot of systematic, idiosyncratic and lexical differences Three types of texts: Corpus linguistics, annotation and research methods Enterprises and Government's reform plan Washing machine manual
SMT Offers Croatian as source and target language Statistically based Monolingual target language texts Aligned texts (human translations) Fluency and adequacy highly depend on available corpora
Reference translations vs. candidate translations Levels of analysis: Lexical (misuse of words, zerotones) Morphological (wrong word forms) Syntactic (word order) Semantic (preservation of original message) Usage of punctuation marks Task
Manual Evaluation Procedure 6 bilingual evaluators and 21 sentences (machine translation and reference translation) 1 – 5 scale Fluency Adequacy 1 incomprehensible none 2 disfluent English little meaning 3 non - native English much meaning 4 good English most meaning 5 flawless English all meaning
Hypotheses There is no significant difference in assigningscore 3 according to both criteria (fluency and adequacy). There are no significant differences in assigning score 3 to fluency and adequacy per each evaluator. Chi-square Test (2)
There is no significant difference in assigning score 3 according to both criteria. There are no significant differences in assigning score 3 to fluency and adequacy per each evaluator. There are significant differences in assigning score 3 to fluency and adequacy for half of the evaluators. Results Fluency and adequacy per average judgements 4 Fluency 2 Adequacy 0 1st 2nd 3rd 4th 5th 6th
Usage: Basic information transfer Personal use only Improvements: Integration with language dependent modules Human post-editing Greater number of evaluators needed IV. Conclusion