1 / 24

The translation of examples, citations, definitions and glosses in the Papillon project

The translation of examples, citations, definitions and glosses in the Papillon project. PAPILLON-02 international seminar, NII, Tokyo, 16-18 July 2002. Christian Boitet GETA, CLIPS, IMAG, CNRS, INPG & UJF Grenoble, France Christian.Boitet@imag.fr. Outline. The problem:

nansen
Download Presentation

The translation of examples, citations, definitions and glosses in the Papillon project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The translation of examples, citations, definitions and glosses in the Papillon project PAPILLON-02 international seminar, NII, Tokyo, 16-18 July 2002 Christian Boitet GETA, CLIPS, IMAG, CNRS, INPG & UJF Grenoble, France Christian.Boitet@imag.fr Translation in Papillon (Ch. Boitet)

  2. Outline • The problem: • given the “pivot” architecture of monolingual dictionaries • translate all “free language elements” into all languages • & store the results, respecting the overall structure • Proposed solutions: • Storing: use auxiliary lexies and axies • Translating-1: shared tools for human translation • Translating-2: partial MT using UNL • Perspectives Translation in Papillon (Ch. Boitet)

  3. Internal architecture of the database Interlingual Dictionary Japanese Dictionary Acception 343 UNL: card(icl>play) French Dictionary カード Vocable Carten.f. Lexie carte à jouer Lexie carte géographique 地図 Acception 345 UNL: map(fld>geography) Architecture derived from Gilles Sérasset’s Ph.D. Thesis Translation in Papillon (Ch. Boitet)

  4. French DiCo Japanese DiCo Interlingual links Vocable carten.f. lexie carte.1 carte à jouer lexie carte.2 carte géographique カード Acception 343 UNL: card(icl>play),card(icl>thing)… 地図 EnglishDiCo Acception 345 UNL: map(fld>geography) Vocable cardN lexie card.1playing card lexie card.2 money card ThaiDiCo Acception 1002 UNL: card(fld>money) a Vocable=lexie map PAPILLON scenario & diagram • Interlingual links motivated by translations = "AXIEs" • Possibilitity to link 1 lexie to >1 acception • Links to other representations: AXIE—1——n—>UW Translation in Papillon (Ch. Boitet)

  5. A monolingual DiCo entry (again) • Name of the lexical unit: MEURTRE • Grammatical properties: nom, masc • Semantic Formula: action de tuer: ~ PAR L'individu X DE L'individu Y • Government pattern:X = I = de N, A-poss Y = II = de N, A-poss • (Quasi-)synonyms: {QSyn} assassinat, homicide#1; crime • Semantic derivations & collocations: • {V0} tuer • {A0} meurtrier-adj / *Nom pour X*/ • {S1} auteur [de ART Ø] //meurtrier-n /*Nom pour Y*/ • {S2} victime [de ART Ø] /*Très choquant*/ • Examples: La mésentente pourrait être le mobile du meurtre. • Full Idioms: • appel au meurtre • crier au meurtre Structure derived from Alain Polguère’s work on DiCo Translation in Papillon (Ch. Boitet)

  6. Fixed and free language elements • Fixed • Stereotyped definition in semantic formula: action de tuer: • Logical argument frame: ~ PAR L'individu X DE L'individu Y • Grammatical properties: nom, masc • Free • Examples: La mésentente pourrait être le mobile du meurtre. • Citations(e.g. for SPIRIT): the spirit is strong, but the flesh is weak (Bible, ref.XXX) • Free definitions in semantic formula (e.g. for a disease noun such as LEUCOCYTE): sort of cell contained in the blood and attacking infectious agents • Glosses (sometimes = quasi-synonyms): character (mood) Translation in Papillon (Ch. Boitet)

  7. The problem (1) • Necessity to translate all free language elements • The translation in L2 of an example for X(L1) is not in general a good example • for the translation of Y in L2 • Il utilise souvent des cartes IGN*He often uses IGN roadmaps/maps • He often uses AA maps • IGN = Institut Géographique National • AA = Automobile Association • Hence, the size of the problem is quadratic! Translation in Papillon (Ch. Boitet)

  8. The problem (2) • Where to store these translations? • Not in the lexies, which must remain monolingual • Not in the axies, which must remain pure links Translation in Papillon (Ch. Boitet)

  9. Solution for the storing problem • Use auxiliary lexies and axies • terminology: x-lexie, x-axie • x  {def, cit, ex, glo} • Each free language element becomes an x-lexie • cit-lexies and ex-lexies are simpler than normal lexies • X-lexies are linked through x-axies • An x-axie contains lists of x-lexiesand, in case of an external reference to UNL • a UNL graph (if x ≠ glo), or a UW (glo-axie) Translation in Papillon (Ch. Boitet)

  10. Multilingual links = AXIES • Normal axies • for each language L, 0:n links to lexies of L • for each semantic system S available,0:n links to entities of SUNL UWs, WordNet synsets, NTT SemCat, Ontos concepts, LexiQuest Lex-concepts… • Auxiliary axies for examples, citations… • for each language L, 0:n links to lexies of L • if UNL-annotated, 1 UNL graph Translation in Papillon (Ch. Boitet)

  11. A « Montaigne » environmemt for Human Translation • Idea: let users SHARE translation memory & tools on a server • Specs in 1995 around Eurolang Optimizer™ • no funding although « Francophony » interested… • Internet version: see www.yakushite.net (OKI) • First version built for Lao • see www.laosoftware.com (V. Berment) • Future: • bilingual editor as applet • use of Papillon server architecture (private spaces etc.) Translation in Papillon (Ch. Boitet)

  12. Scenario & possible GUI Typical layout of a bilingual editor in a TSS Translation in Papillon (Ch. Boitet)

  13. Design & implementation issues • Peer-to-peer architecture • PapillonMontaigne • Possibility to modify input text • & segmentation • Integrate with private lexicon(s) • Open to plug-ins (voice input…) Translation in Papillon (Ch. Boitet)

  14. Automating translation using UNL • UNL = • a project • a language to represent NL utterance meanings • a format for multilingual documents (htmlxml) • Elements of the UNL language • UWs: headword(restrictions) book(icl>do) • Attributes: @future, @past, @complete…, @entry • Relations: agt, aoj, mod, obj, tim… • (Hyper)graphs: subgraph is connex & has an entry node Translation in Papillon (Ch. Boitet)

  15. A simpleUNL input graph Translation in Papillon (Ch. Boitet)

  16. Possible interactive disambiguation at analysis time Translation in Papillon (Ch. Boitet)

  17. Interactive disambiguation (2) - gives a correct unique multilevel concrete (UMC) tree - then a correct unique multilevel abstract (UMA) tree - and finally a correct UNL graph Translation in Papillon (Ch. Boitet)

  18. Possible text-graph « coedition » at reading time • applicable if there is a UNL graph associated with a segment one wants to modify • goal : share the revisions across languages, • by reflecting them on the UNL graph • Ex: FB2204 (Forum Barcelona 2004) • « Une cité retrouvera une zone côtière après unforum » • add ".@def" on the nodes for "city", ”forum” • transform “forum” into “Forum” • replace "retrieve" by "recover" • add ".@complete" on the node containing it. • « La cité récupérera une zone côtière après leForum » Translation in Papillon (Ch. Boitet)

  19. Translation in Papillon (Ch. Boitet)

  20. Translation in Papillon (Ch. Boitet)

  21. Translation in Papillon (Ch. Boitet)

  22. Principles of coedition (1) • It is impossible in principle to deduce the modification on the graph from a modification on the text • For example, replacing "un" ("a") by "le" ("the") • does not entail that the following noun is determined (.@def), because it can also be generic • "il aime la montagne" = "he likes mountains" • Revision is not done by modifying directly the text, but by using a menu system • Menu items have a "language side" and a hidden "UNL side" Translation in Papillon (Ch. Boitet)

  23. Principles of coedition (2) • when a menu item is chosen, • only the graph is transformed, • the action to be done on the text is delayed and shown • at any time, the new graph may be deconverted • If is is satisfactory, that shows that errors were due to the graph and not to the deconverter, and the graph may be sent to deconverters in other languages. • Versions in some other languages known by the user may be displayed, so that improvement sharing is visible and encouraging. Translation in Papillon (Ch. Boitet)

  24. Conclusion • Need for a translation task (#7) in Papillon • Seamless integration with x-lexies + x-axies • Possible combination of TA & MT • Mutualization spirit (Papillon, Montaigne) for TA • Use of UNL (2 « pivot » architectures) for MT • Mutualization again in MT part (humans involved) • Interactive disambiguation • Coedition textUNL graph • … & of course lexical data contribution through Papillon! Translation in Papillon (Ch. Boitet)

More Related