170 likes | 520 Views
What a professional translator should know about Machine Translation. Harold Somers Professor Emeritus University of Manchester. Background. Machine Translation (MT): 60-year-old technology, firmly established (esp. free online MT) as viable, though flawed.
E N D
What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester
Background • Machine Translation (MT): 60-year-old technology, firmly established (esp. free online MT) as viable, though flawed. • Professional translators’ reservations • fears (misplaced as it turns out) that it would take work away from them • disgust at bad image MT gives to the profession • recent developments suggest need for more reconciliatory approach • MT as a “colleague” rather than a rival • Need better understanding of what MT can and, more importantly, cannot do.
Overview • Focusing on full text MT … • How MT works • Strengths and weaknesses • What should translators say about MT? • Assumption: we’re mostly talking about free online MT here
History of MT • 1945-65: Crude early attempts with unsophisticated computers and naïve linguistic approach • mainly word-for-word • 1966-90: Linguistic rule-based programs • some successes, especially with “sublanguage” • requires much effort to build • 1991-…: Statistics-based programs, learning translation patterns from large amounts of data • quick to develop if data is available • surprisingly good quality (but see later)
How does (S)MT work? • Requires huge amounts of parallel (bilingual) data – i.e. texts and their translations • Programs automatically align the texts (sentence-by-sentence where possible), then extract (or “learn”) translation probabilities (“models”) • At run time, probabilities are juggled to get the highest scoring result
A little more detail • Actually, two models are learned from the data: • Translation model: given words and word sequences in SL, what are the most likely corresponding words in the TL? • Target-language model: given these corresponding TL words, what is the most likely way in which they will be combined?
So how well does that work? • Let’s look first at what makes translation hard for a computer … • …then see how well SMT handles these difficulties … • … and what we can conclude from that
Language is highly ambiguous Translation largely requires genuine understanding Translation is all about style Why is translation hard for a computer? • You may not have realised that, but for a computer it is true • Debatable in some cases, but undoubtedly often true • Well, sometimes it is!
Individual words ambiguous morphology homonymy polysemy translation divergences Sequences of words local ambiguity global ambiguity “Dependencies” TL grammar Humans use their general understanding of context or plausibility, and so often don’t even notice the ambiguity “Contrastive” knowledge of languages is a big part of what translation is about Language is difficult • numb:number, tow:tower • round, bank, last, flush • report, range • wall = muro/parete • car boot sale, He shot the man with a gun • Time flies like an arrow • He left the passage that had taken him so long to compose out • This bed has been slept in
How does MT cope? • Ambiguity • “Translation models” handle not only individual words, but word sequences • If the model has the wrong interpretation, the system is likely to reproduce it • Also, dependencies between words (which can be arbitrarily distant from each other) are more difficult to capture • Target-language models may also help here
How does MT cope? • Style and nuance • Both translation and TL models can only reflect the data on which they have been trained • Probability data is generally not fine-grained enough to capture niceties • Again, anything that depends on long-distance dependencies is unlikely to shine through
What are MT’s strengths? • Impact of training data is paramount: • MT performs best when translating the kind of text it has been trained on • This was also true of rule-based systems • Somewhat true of (specialised) human translators too • Tension between • need to use as much material as possible for training • desire (eg Google) to provide a generic translation service • trade-off between coverage and translation quality
What are MT’s strengths? • MT in general performs well with • simple grammatical source text free of ambiguities, colloquialisms, etc. for which style and nuance is not so important • Happily these are the kinds of texts that human translators find least engaging • However well MT manages, it is not 100% reliable as as a human
What you should say about MT • It’s good (even preferable) for some things • Mainly translation into the client’s language (“assimilation”) • Reading a document in a foreign language to see what it’s about and whether they need a proper translation, or which bits need translation • They may feel able (if they know the source language) to tidy it up (“post-editing”, “revision”) themselves, though they should always be aware of the risk involved • Rough and ready translation into a foreign language • eg for informal communication with someone who can tolerate a rough translation • Again, the risks must be emphasised • Possible use (even by translators) of MT as a first draft: postediting
What you should say about MT • But for other things MT might be quite unsuitable, and HT is still a better bet • Certainly any document (eg for publication) where the quality of the translation will reflect on your client • Any document where style and presentation is important • Any document where accuracy is crucial • Translation into a target language that the client does not know at all carries a major risk
A final word of warning • Clients might like to evaluate an MT system for themselves • A common method is back-and-forth (“round trip”) translation • This has some major drawbacks: • A bad RT may be caused by a bad outward trip or a bad return trip … hard to know which • A good RT may hide a bad translation – eg word for word nonsense in the TL, which comes back as the same original source text • So RTT on a single sentence won’t tell you much … so test it with a longer text: if it does OK it may be a fair result; if it does badly you can never be sure why