1 / 24

What’s so hard about translation?

Ed Kenschaft University of Maryland UMIACS, CLIP Lab. What’s so hard about translation?. Translation Needs (1). Assimilation News monitoring Intercepts, noisy documents High recall, low precision. Translation Needs (2). Dissemination UN, EU Commercial documentation Bible translation

gavrila
Download Presentation

What’s so hard about translation?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ed Kenschaft University of Maryland UMIACS, CLIP Lab What’s so hard about translation?

  2. Translation Needs (1) Assimilation News monitoring Intercepts, noisy documents High recall, low precision

  3. Translation Needs (2) Dissemination UN, EU Commercial documentation Bible translation High recall & precision

  4. Translation Needs (3) Emergency Military Medical Disaster relief High precision, moderate recall

  5. SIL International Faith-based Christian organization Partner with speakers of languages that have never been written down Purposes preserve the language and culture document the language for study translate the Bible and community development materials Documented 1400+ languages in 70+ countries

  6. Challenges [1] Ultra-low-density languages mostly unwritten no large (or small) parallel corpora no Bible for bootstrapping

  7. Challenges [2] Untrained translators 6th grade education One trained linguist for 10 languages

  8. Challenges [3] Exceedingly rich domain of discourse approximates all of natural language Genres historical narrative dialog poetry personal letters Topics business, politics, sex, relationships, diet … no controlled vocabulary

  9. Challenges [4] Demand for 100% accuracy/fluency Life-changing lessons Easy to misinterpret

  10. Challenges [5] Nearly endless variety of target languages ~6800 languages ~1400 written, ~5400 unwritten ~half will survive next century ~2000-3000 remaining

  11. Linguistic Variation Phonological variation Morphological variation three-boys-shot-arrows-at-the-gazelle Syntactic variation grammatical markers (e.g. dual, causative) discourse markers (e.g. topic/focus) honorifics

  12. Cultural Variation Cleanse me with hyssop, and I will be clean; wash me, and I will be whiter than snow.(Psalm 51:7, NIV) What is hyssop? What is snow? What does it mean to be white?

  13. Cultural Variation Cleanse me with a plant indigenous to the lands of the ancient Near East, used in Jewish religious ceremonies, and I will be whiter than the precipitation that falls like rain when the weather is very cold, which indicates a state of moral purity.

  14. Intelligibility ≠ Fidelity (1) Moses had horns.

  15. Intelligibility ≠ Fidelity (2) Where there is no vision, the people perish. (Proverbs 29:18a, KJ21)

  16. Intelligibility ≠ Fidelity (2) Where there is no vision, the people perish. (Proverbs 29:18a, KJ21) When people do not accept divine guidance, they run wild. (Pr 29:18a, NLT)

  17. Waste of Time? Can a computer solve all these problems? Not on your life Can a computer replace a translator? Limited domains only What can it do? Word-processing Data storage & analysis First draft?

  18. General Approach CAT vs. MT Linguistically informed systems Supervised learning Exploit all available resources SL resources Existing TL data

  19. Data Representation Text encoding Unicode Fonts Graphite Interlinear text LinguaLinks, Toolbox, FieldWorks

  20. Elicitation & Analysis Elicit syntactic & morphological data AVENUE, EXPEDITION Elicit word lists for language survey WordSurv

  21. SL Resources Related language adaptation CARLA Projection across word alignment GIZA++, Multi-Align, Parser Projection

  22. NLG Rich interlingua TBTA (Tod Allman) Statistical fluency enhancement (Sebastian Varges)

  23. Evaluation Need for automation Multiplying documents Shortage of experts BLEU How well does it work? What does it mean? METEOR Stresses recall

  24. The Limits of NLP Who knows?

More Related