1 / 29

International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002

A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand). International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002

yon
Download Presentation

International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A roadmap for MT : four « keys »to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002 Christian BoitetGETA, CLIPS, IMAG, 385 av. de la bibliothèque, BP 53F-38041 Grenoble cedex 9, FranceChristian.Boitet@imag.fr, http://clips.imag.fr/geta

  2. Outline • Basic concepts • What is MT ? • Goals: Quality / User • Architectures: Vauquois' triangle • State of the art • MT of texts: examples, problems • MT of spoken dialogs • The future of MT • Goals • 4 keys ICUKL2002, Goa, 25-29/11/2002

  3. What is M(a)T ? • At least 3 types of automation • MT = Machine Translation • MAT = Machine Assisted Translation • MAHT = Machine Aided Human Translation • A scientific technology • Informatics (computer science) • Linguistics • Mathematics ICUKL2002, Goa, 25-29/11/2002

  4. Goals: Quality / User ICUKL2002, Goa, 25-29/11/2002

  5. Architectures: Vauquois' triangle ICUKL2002, Goa, 25-29/11/2002

  6. Architekturen: Vauquois Dreieck (größer) ICUKL2002, Goa, 25-29/11/2002

  7. Formal intermediate structures ICUKL2002, Goa, 25-29/11/2002

  8. How to produce an MT system • Choose an architecture • Program the "tools" • Spezialized languages for linguistic programming (SSLP) • Development environment (MT shell) • Build the "lingware" • Lexical data / rules / weights • Grammatical data / rules / weights • Possible specialization to a typology ("sublanguage") • How? • Human work ± computer help / support • Automatic learning (weights, likeliness…) ICUKL2002, Goa, 25-29/11/2002

  9. State of affairs • only a small number of language pairs is covered by MT systems designed for information access • Systran EC (2000): 19/110 language pairs, 8 OK for intended use • See also examples by Ronaldo Martins • even fewer are capable of quality translation or speech translation • Now a few examples… ICUKL2002, Goa, 25-29/11/2002

  10. Examples: MT for access, Web (1) ICUKL2002, Goa, 25-29/11/2002

  11. Examples: MT for access, Web (2) • FE quite "easy", compared with EG and mainly FG ICUKL2002, Goa, 25-29/11/2002

  12. Comparison: raw vs rough MT ICUKL2002, Goa, 25-29/11/2002

  13. Examples: MT for revisors… ICUKL2002, Goa, 25-29/11/2002

  14. …with BV-aero/FE (2) ICUKL2002, Goa, 25-29/11/2002

  15. MT of spoken dialogs • Specialized systems are already usable • e.g. ATR/Matsushita, IBM, CSTAR/Nespole!… • Much "noise" and "ungrammaticalities" • But specializing is very helpful! • General systems are also possible • e.g. NEC/Xroad, Linguatec/Talk&Translate • Speech recognition is already good enough • Rough may be good enough (e.g. for chatting) • Interpretation is different from translation… • …and participants are intelligent ! • Similarity with access-oriented-MT ICUKL2002, Goa, 25-29/11/2002

  16. French-Korean through IF (1) ICUKL2002, Goa, 25-29/11/2002

  17. French-Korean through IF (2) ICUKL2002, Goa, 25-29/11/2002

  18. French-Korean through IF (3) ICUKL2002, Goa, 25-29/11/2002

  19. A road map… to which goals? • MT of adequate quality • Not only for access • For all languages ICUKL2002, Goa, 25-29/11/2002

  20. Four keys • 2 on the technical side • 2 on the organizational side • Compromize: a far wider coverage, a somewhat smaller asymptotic quality • Automatic learning techniques • Using non-textual pivots (intermediate formal descriptors) • Democratization, cooperation • Cooperative development of open source linguistic resourceson the Web • Towards systems where quality can be improved "on demand"by users ICUKL2002, Goa, 25-29/11/2002

  21. Learning techniques • Extend the use of hybrid techniques • symbolic, numerical, or mixed • ==> they have demonstrated their potential at the research level • stochastic grammars • weighted (or "neural") dictionaries • or build new tools, intrinsically numerical • inspiration from voice recognition • 2 examples • learning analyzers : text —> semantic tree (IBM) • learning implicit very detailed DG from tree bank (NAIST) ICUKL2002, Goa, 25-29/11/2002

  22. Using non-textual pivots • Semantico-pragmatic (ontological) pivots • task & domain oriented ==> limited applicability • Abstract linguistic descriptors • the most precise, but often too sophisticated • depend on each language • Anglo-semantic pivot: UNL • "the HTML of linguistic content" • in UNL, a hypergraph represents the abstract structure of (supposedly) equivalent English utterance • less precise but "robust" • symbols constructed from English ==> usable by all developers ICUKL2002, Goa, 25-29/11/2002

  23. score(icl>event,agt>human,fld>sport).@entry.@past.@complete agt obj ins plt Ronaldo(icl>proper noun) head(pof>body).@def pos corner(icl>thing).@def goal(icl>abstract thing) pos mod goal(icl>concrete thing) left(aoj<thing) A simple UNL graph • Ronaldo has headed the ball into the left corner of the goal ICUKL2002, Goa, 25-29/11/2002

  24. Cooperative development • of open source linguistic resources • on the Web • Mutualization is necessary at least for lexical knowledge • too costly even for the leaders • size (#entries) has to augment for each language (300K, 3M?) • #languages has to increase dramatically (11 —> 20 —> 180?) • Integration of human- and machine-oriented knowledge is useful • e.g. to produce mixed MT/MAHT systems ICUKL2002, Goa, 25-29/11/2002

  25. A contribution: the Papillon project • Goal: • produce many open source dictionaries from a central lexical data base • Means: • build rich (DiCo) monolingual dictionaries of lexies (senses) • interlink lexies by interlingual links (axies) • use XML & associated tools as basis to generate many formats • for humans and for machines • start from (free) digital resources • induce "consumers" to become "producers" (contributors) • Quality control: • private accounts • central validating/integrating group ICUKL2002, Goa, 25-29/11/2002

  26. Lexical Database Dictionary Dictionary User User User Resource Resource Resource Papillon database macrostructure Interaction withthe Dictionaries Extraction ofDictionaries Human Contributors Integration of existing resources ICUKL2002, Goa, 25-29/11/2002

  27. French. DiCo Japan. DiCo Interlingual links Vocable carten.f. Lexie carte.1 carte à jouer Lexie carte.2 carte géographique カード Acception 343 UNL: card(icl>play),card(icl>thing)… 地図 Engl.DiCo Acception 345 UNL: map(fld>geography) Vocable cardN Lexie card.1playing card Lexie card.2 money card ThaiDiCo Acception 1002 UNL: card(fld>money) a Vocable=lexie map PAPILLON diagram • Interlingual links based on translations = "AXIEs" • Possibility to link 1 lexie with >1 acceptions • References to other semantic systems: AXIE—1————n—>UW ICUKL2002, Goa, 25-29/11/2002

  28. Construct systems where quality can be improved "on demand" by users • a priori through interactive disambiguation in the source language • or a posteriori by correcting the pivot representation (UNL or other) through any language (as in MultiMeteo) • ==> In the 2 cases, all versions (in all languages) are improved • possibility to merge • MT • multilingual generation • computer-aided authoring ICUKL2002, Goa, 25-29/11/2002

  29. Conclusion • 4 keys to open the door to MT of adequate quality to all languages • On the technical side, • dramatically increase the use of learning techniques • use pivot architectures, the most universally usable pivot being UNL • On the organizational side, • cooperatively develop open source linguistic resources on the web • construct systems where quality can be improved "on demand" by users • On the practical side, • seek keys to unlock private investment, public funding, voluntary cooperation • could this conference become a decisive turning point? ICUKL2002, Goa, 25-29/11/2002

More Related