220 likes | 327 Views
Human Language Technology. Machine Translation Architectures Direct MT Transfer MT Interlingual MT. History – Pre ALPAC. 1952 – First MT Conference (MIT) 1954 – Georgetown System (word for word based) successfully translated 49 Russian sentences
E N D
Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT MT Architectures
History – Pre ALPAC • 1952 – First MT Conference (MIT) • 1954 – Georgetown System (word for word based) successfully translated 49 Russian sentences • 1954 – 1965 – Much investment into brute force empirical approach – crude word-for-word techniques with limited reshuffling of output • ALPAC (Automatic Language Processing Advisory Committee) Report concludes that research funds should be directed into more fundamental linguistic research MT Architectures
History – three eras • 1965-1970 • Operational Systems approach: SYSTRAN (eventually became the basis for babelfish) • University centres established in Grenoble (CETA), Montreal and Saarbruecken • 1970-1990: Systems developed on the basis of linguistic and non-linguistic representations Ariane (Dependency Grammar) • TAUM METEO (Metamorphoses Grammars) • EUROTRA (multilingual intermediate representations) • ROSETTA (Landsbergen) interlingua based • BSO (Witkam) – Esperanto • 1990- Data Driven Translation Systems MT Architectures
MT Methods MT Direct MT Rule-Based MT Data-Driven MT Transfer Interlingua EBMT SMT MT Architectures
source text target text Basic Architecture:Direct Translation • Basic idea • language pair specific • no intermediate representation- pipeline architecture MT Architectures
Staged Direct MT (En/Jp) MT Architectures
Direct TranslationAdvantages • Exploits fact that certain potential ambiguities can be left unresolved if you know language pairwall -wand/mauer – parete/muro • Designers can concentrate more on special cases where languages differ. • Minimal resources necessary: a cheap bilingual dictionary & rudimentary knowledge of target language suffices. • Translation memories are a (successful and much used) development of this approach. MT Architectures
Direct TranslationDisadvantages • Computationally naive • Basic model: word-for-word translation + local reordering (e.g. to handle adj+noun order) • Linguistically naive: • no analysis of internal structure of input, esp. wrt the grammatical relationships between the main parts of sentences. • no generalisation; everything on a case-by-case basis. • Generally, poor translation • except in simple cases where there is lots of isomorphism between sentences. MT Architectures
Transfer Model of MT • To overcome language differences, first build a more abstract representation of the input. • The translation process as such (called transfer) operates upon at the level of the representation. • This architecture assumes • analysis via some kind of parsing process. • synthesis via some kind of generation. MT Architectures
Basic Architecture:Transfer Model source representation target representation transfer analysis generation target text source text MT Architectures
Transfer Rules In General there are two kinds of transfer rule: • Structural Transfer Rules: these deal with differences in the syntactic structures. • Lexical Transfer Rules: these deal with cross lingual mappings at the level of words and fixed phrases. MT Architectures
Structural Transfer Rule NPs(Adjs,Nouns) NPt(Nount,Adjt) MT Architectures
Lexical Transfer • Easy cases are based on bilingual dictionary lookup. • Resolution of ambiguitiesmay require further knowledge know savoirknow connaître • Not necessarily word for wordschimmel white horse MT Architectures
Transfer Model • Degree of generalisation depends upon depth of representation: • Deeper the representation, harder it is to do analysis or generation. • Shallower the representation, the larger the transfer component. • Where does ambiguity get resolved? • Number of bilingual components can get large. MT Architectures
Interlingual Translation:The Vauquois Triangle interlingua increasing depth analysis generation target text source text MT Architectures
Interlingual Translation • Transfer model requires different transfer rules for each language pair. • Much work for multilingual system. • Interlingual approach eliminates transfer altogether by creating a language independent canonical form known as an interlingua. • Various logic-based schemes have been used to represent such forms. • Other approaches include attribute/value matrices called feature structures. MT Architectures
Possible Feature Structure for “There was an old man gardening” event gardening type man agent number sg definiteness indef aspect progressive tense past MT Architectures
Ontological Issues • The designer of an interlingua has a very difficult task. • What is the appropriate inventory of attributes and values? • Clearly, the choice has radical effects on the ability of the system to translate faithfully. • For instance, to handle the muro/parete distinction, the internal/external characteristic of the wall would have to be encoded. MT Architectures
Feature Structure for “muro” word muro syntax POS class noun type count field buildings semantics type structural position external MT Architectures
Interlingual Approach Pros and Cons • Pros • Portable (avoids N2 problem) • Because representation is normalised structural transformations are simpler to state. • Explanatory Adequacy • Cons • Difficult to deal with terms on primitive level: • universals? • Must decompose and reassemble concepts • Useful information lost (paraphrase) • In practice, works best in small domains. MT Architectures