340 likes | 438 Views
Week 9: resources for globalisation. Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate Calculus Human involvement Historical note. Spelling dictionaries.
E N D
Week 9: resources for globalisation • Finish spell checkers • Machine Translation (MT) • The ‘decoding’ paradigm • Ambiguity • Translation models • Interlingua and First Order Predicate Calculus • Human involvement • Historical note
Spelling dictionaries • Implementing spelling identification and correction algorithm
Spelling dictionaries • Implementing spelling identification and correction algorithm • STAGE 1: compare each string in document with a list of legal strings; if no corresponding string in list mark as misspelled • STAGE 2: generate list of candidates • Apply any single transformation to the typo string • Filter the list by checking against a dictionary • STAGE 3: assign probability values to each candidate in the list • STAGE 4: select best candidate
Spelling dictionaries • STAGE 3 • prior probability • given all the words in English, is this candidate more likely to be what the typist meant than that candidate? • P(c) = c/N where N is the number of words in a corpus • likelihood • Given, the possible errors, or transformation, how likely is it that error y has operated on candidate x to produce the typo? • P(t/c), calculated using a corpus of errors, or transformations • Bayesian rule: • get the product of the prior probability and the likelihood • P(c) X P(t/c)
Spelling dictionaries • non-word errors • Implementing spelling identification and correction algorithm • STAGE 1: identify misspelled words • STAGE 2: generate list of candidates • STAGE 3a: rank candidates for probability • STAGE 3b: select best candidate • Implement: • noisy channel model • Bayesian Rule
Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol
Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy)
Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy) • one-to-many (hypernym → hyponyms):
Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • many-to-one (hyponyms → hypernym)
Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • bank → Ufer, Bank (German)
Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • brother →otooto, oniisan (Japanese) • blue → синий, голубой (Russian) • many-to-one (hyponyms → hypernym)
Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • many-to-one (hyponyms → hypernym) • hill, mountain →Berg (German) • learn, teach → leren (Dutch)
Machine translation and globalisation • Ambiguity ‘I made her duck’ “The possibility of interpreting an expression in two or more distinct ways” Collins English Dictionary
Machine translation • Ambiguity • Challenge of the translation depends on the level of ambiguity that arises • This depends on the closeness of the source and target languages w.r.t. the following: • vocabulary • homonyms • grammar • structural ambiguity • conceptual structure • specificity ambiguity • lexical gaps
Machine translation • Pragmatic approach
Machine translation • Pragmatic approach • aim for a rough translation, ‘gist’ translation • Used for multi-lingual information retrieval
Machine translation • Pragmatic approach • aim for a rough translation, ‘gist’ translation • Used for multi-lingual information retrieval • involve human translators in the process: computer-aided translation
Machine translation • Translation models • Transfer model • ‘the dog bit my friend’ Hindi: kutte-ne mere dost ko-kata dog my friend bit
Machine translation • Translation models • Transfer model • Alter grammatical structure of source language to make it adhere to the grammatical structure of target language • Use transformation rule • Analysis process (source) • Transfer process (‘bridge’) • Generation process (target) • Problem: each source-target pair will need it own unique set of transformation rules
Machine translation • Translation models • Inter-lingua model • Extract the meaning from the source string • Give it a language independent representation, i.e. an interlingua • Translation process takes the interlingua as its input • Multiple translation processes take the same input for multiple target language outputs
Machine translation • Translation models • What is the inter-lingua? • for words, some sort of semantic analysis, e.g. (GO, BY-FOOT) (GO, BY-TRANSPORT) Russian: идтиехать English: go go
Machine translation and globalisation • Translation models • What is the inter-lingua? • for sentences, a logical language e.g. First Order Predicate Calculus
Meaning representation • Goal: 1. the semantic representation must give you a one-to-one mapping to non-linguistic knowledge of the world 2. The representation must be expressive, i.e. handle different types of data
Meaning representation • First Order Predicate Calculus • computationally tractable • objects (terms) • properties of objects • relations amongst objects • Predicate argument structure • large composite representations • logical connectives
Meaning representation • First Order Predicate Calculus • Object: referred to uniquely by a term • constant e.g. SurreyUniversity • function e.g. LocationOf(SurreyUniversity) • variable
Meaning representation • First Order Predicate Calculus • Relations amongst objects • Predicates: “symbols that refer to, or name, the relations that hold among some fixed number of objects” (J & M) • Educates(SurreyUniversity, Citizens) • two-place predicate
Meaning representation • First Order Predicate Calculus • Relations amongst objects • Predicates: • Can specify the category of an object • University(SurreyUniversity) • one-place predicate
Meaning representation • First Order Predicate Calculus • properties / parts of objects • functions: • LocationOf(SurreyUniversity)
Meaning representation • First Order Predicate Calculus • Composite representations through predicates and functions: Near(LocationOf(SurreyUniversity), LocationOf(Cathedral))
Meaning representation • First Order Predicate Calculus • Logical connectives • combine basic representations to form larger more complex representations e.g ٨ operator = ‘and’
Meaning representation • First Order Predicate Calculus • Logical connectives • combine basic representations to form larger more complex representations Educates(SurreyUniversity, Citizens) ٨ ¬ Remunerates(SurreyUniversity, Staff)
Machine translation and globalisation • Machine translation and globalisation: change of priorities • 1954: IBM and Georgetown University, first MT demo • goal: ‘perfect’ translation • 1967: Automatic Language Process Advisory Committee (ALPAC) report: damning of goal • Post ALPAC • Goal: rough translation, involve human element • Current situation: online translation, e.g. Babel Fish, descendant of SYSTRAN whose goal was rough translation • Journal of Machine Translation
Next week • Globalisation as an industry • SDL and the SDLX-TRADOS globalisation application