1 / 34

Week 9: resources for globalisation

Week 9: resources for globalisation. Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate Calculus Human involvement Historical note. Spelling dictionaries.

zion
Download Presentation

Week 9: resources for globalisation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 9: resources for globalisation • Finish spell checkers • Machine Translation (MT) • The ‘decoding’ paradigm • Ambiguity • Translation models • Interlingua and First Order Predicate Calculus • Human involvement • Historical note

  2. Spelling dictionaries • Implementing spelling identification and correction algorithm

  3. Spelling dictionaries • Implementing spelling identification and correction algorithm • STAGE 1: compare each string in document with a list of legal strings; if no corresponding string in list mark as misspelled • STAGE 2: generate list of candidates • Apply any single transformation to the typo string • Filter the list by checking against a dictionary • STAGE 3: assign probability values to each candidate in the list • STAGE 4: select best candidate

  4. Spelling dictionaries • STAGE 3 • prior probability • given all the words in English, is this candidate more likely to be what the typist meant than that candidate? • P(c) = c/N where N is the number of words in a corpus • likelihood • Given, the possible errors, or transformation, how likely is it that error y has operated on candidate x to produce the typo? • P(t/c), calculated using a corpus of errors, or transformations • Bayesian rule: • get the product of the prior probability and the likelihood • P(c) X P(t/c)

  5. Spelling dictionaries • non-word errors • Implementing spelling identification and correction algorithm • STAGE 1: identify misspelled words • STAGE 2: generate list of candidates • STAGE 3a: rank candidates for probability • STAGE 3b: select best candidate • Implement: • noisy channel model • Bayesian Rule

  6. Resoucres for Globalisation:Machine translation

  7. Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol

  8. Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy)

  9. Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy) • one-to-many (hypernym → hyponyms):

  10. Resoucres for Globalisation:Machine translation • The ‘decoding’ paradigm • Assumes one-to-one relation between source symbol and target symbol • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • many-to-one (hyponyms → hypernym)

  11. Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • bank → Ufer, Bank (German)

  12. Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • brother →otooto, oniisan (Japanese) • blue → синий, голубой (Russian) • many-to-one (hyponyms → hypernym)

  13. Machine translation • The ‘decoding’ paradigm • one-to-many (homonymy) • one-to-many (hypernym → hyponyms): • many-to-one (hyponyms → hypernym) • hill, mountain →Berg (German) • learn, teach → leren (Dutch)

  14. Machine translation and globalisation • Ambiguity ‘I made her duck’ “The possibility of interpreting an expression in two or more distinct ways” Collins English Dictionary

  15. Machine translation • Ambiguity • Challenge of the translation depends on the level of ambiguity that arises • This depends on the closeness of the source and target languages w.r.t. the following: • vocabulary • homonyms • grammar • structural ambiguity • conceptual structure • specificity ambiguity • lexical gaps

  16. Machine translation • Pragmatic approach

  17. Machine translation • Pragmatic approach • aim for a rough translation, ‘gist’ translation • Used for multi-lingual information retrieval

  18. Machine translation • Pragmatic approach • aim for a rough translation, ‘gist’ translation • Used for multi-lingual information retrieval • involve human translators in the process: computer-aided translation

  19. Machine translation • Translation models • Transfer model • ‘the dog bit my friend’ Hindi: kutte-ne mere dost ko-kata dog my friend bit

  20. Machine translation • Translation models • Transfer model • Alter grammatical structure of source language to make it adhere to the grammatical structure of target language • Use transformation rule • Analysis process (source) • Transfer process (‘bridge’) • Generation process (target) • Problem: each source-target pair will need it own unique set of transformation rules

  21. Machine translation • Translation models • Inter-lingua model • Extract the meaning from the source string • Give it a language independent representation, i.e. an interlingua • Translation process takes the interlingua as its input • Multiple translation processes take the same input for multiple target language outputs

  22. Machine translation • Translation models • What is the inter-lingua? • for words, some sort of semantic analysis, e.g. (GO, BY-FOOT) (GO, BY-TRANSPORT) Russian: идтиехать English: go go

  23. Machine translation and globalisation • Translation models • What is the inter-lingua? • for sentences, a logical language e.g. First Order Predicate Calculus

  24. Meaning representation • Goal: 1. the semantic representation must give you a one-to-one mapping to non-linguistic knowledge of the world 2. The representation must be expressive, i.e. handle different types of data

  25. Meaning representation • First Order Predicate Calculus • computationally tractable • objects (terms) • properties of objects • relations amongst objects • Predicate argument structure • large composite representations • logical connectives

  26. Meaning representation • First Order Predicate Calculus • Object: referred to uniquely by a term • constant e.g. SurreyUniversity • function e.g. LocationOf(SurreyUniversity) • variable

  27. Meaning representation • First Order Predicate Calculus • Relations amongst objects • Predicates: “symbols that refer to, or name, the relations that hold among some fixed number of objects” (J & M) • Educates(SurreyUniversity, Citizens) • two-place predicate

  28. Meaning representation • First Order Predicate Calculus • Relations amongst objects • Predicates: • Can specify the category of an object • University(SurreyUniversity) • one-place predicate

  29. Meaning representation • First Order Predicate Calculus • properties / parts of objects • functions: • LocationOf(SurreyUniversity)

  30. Meaning representation • First Order Predicate Calculus • Composite representations through predicates and functions: Near(LocationOf(SurreyUniversity), LocationOf(Cathedral))

  31. Meaning representation • First Order Predicate Calculus • Logical connectives • combine basic representations to form larger more complex representations e.g ٨ operator = ‘and’

  32. Meaning representation • First Order Predicate Calculus • Logical connectives • combine basic representations to form larger more complex representations Educates(SurreyUniversity, Citizens) ٨ ¬ Remunerates(SurreyUniversity, Staff)

  33. Machine translation and globalisation • Machine translation and globalisation: change of priorities • 1954: IBM and Georgetown University, first MT demo • goal: ‘perfect’ translation • 1967: Automatic Language Process Advisory Committee (ALPAC) report: damning of goal • Post ALPAC • Goal: rough translation, involve human element • Current situation: online translation, e.g. Babel Fish, descendant of SYSTRAN whose goal was rough translation • Journal of Machine Translation

  34. Next week • Globalisation as an industry • SDL and the SDLX-TRADOS globalisation application

More Related