1 / 32

Kyoto-U: NTCIR-7 Patent Translation System - Overview and Examples

Explore the Kyoto-U syntactical EBMT system for patent translation, including translation examples and alignment techniques. The system handles word correspondences, dependencies, ambiguities, and consistency in translations. Input and output examples showcase its effectiveness in translating complex texts accurately. Find out how this system uses bilingual dictionaries, substring co-occurrence, and numeral normalization to improve translation quality. Discover how it handles word disambiguation, phrase handling, and registration to a translation example database for precise results.

ragsdale
Download Presentation

Kyoto-U: NTCIR-7 Patent Translation System - Overview and Examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi

  2. Overview of Kyoto-U System Translation Examples J: 図書館で新聞を読む E: I read a newspaper in the library J: 政治の本が売れ残っている E: A book in politics was left on the shelf ・・・・・

  3. Overview of Kyoto-U System Translation Examples I 図書館 で in library read 新聞 を newspaper ACC a newspaper 読む read in the library in politics a book 政治 の book NOM in politics 本 が was left left unsold 売れ残って いる on the shelf ・・・・・           ・・・・・

  4. Overview of Kyoto-U System Translation Examples Input: 図書館で政治の本を読む。 I 図書館 で read 新聞 を I a newspaper 読む 図書館 で read in the library in library 政治 の a book in politics 本 を in politics book ACC 読む in the library read a book 政治 の in politics 本 が was left 売れ残って いる Output: I read a book in politics in the library on the shelf ・・・・・           ・・・・・

  5. Alignment

  6. J: 交差点で、突然あの車が 飛び出して来たのです。 E:The car came at me from the side at the intersection. Alignment

  7. the car came at me 交差 from the side 点 で 、 at the intersection 突然 あの 車 が 飛び出して 来た のです Alignment • Transformation into dependency structure J: JUMAN/KNP E: Charniak’s nlparser → Dependency tree

  8. the car came at me 交差 from the side 点 で 、 at the intersection 突然 あの 車 が 飛び出して 来た のです Alignment • Transformation into dependency structure • Detection of word(s) correspondences

  9. Finding Correspondences • Bilingual dictionaries (500K entries) • Substring co-occurrence (Cromieres 2006) • Numeral normalization 二百十六万 → 2,160,000 ← 2.16 million • Transliteration (Katakana words, NEs) ローズワイン → rosuwain ⇔ rose wine (similarity:0.78) 新宿 → shinjuku ⇔ shinjuku (similarity:1.0)

  10. the car came at me 交差 from the side 点 で 、 at the intersection 突然 あの 車 が 飛び出して 来た のです Alignment • Transformation into dependency structure • Detection of word(s) correspondences • Disambiguation of correspondences

  11. the car came at me 交差 from the side 点 で 、 at the intersection 突然 あの 車 が 飛び出して 来た のです Alignment • Transformation into dependency structure • Detection of word(s) correspondences • Disambiguation of correspondences • Handling of remaining phrases Extension to leaf-nodes

  12. the car came at me 交差 from the side 点 で 、 at the intersection 突然 あの 車 が 飛び出して 来た のです Alignment • Transformation into dependency structure • Detection of word(s) correspondences • Disambiguation of correspondences • Handling of remaining phrases • Registration to translation example database

  13. 日本 で you 保険 will have to file 会社 に 対して insurance 保険 an claim 請求 の insurance 申し立て が with the office 可能です よ in Japan Alignment Ambiguities [in Japan] [insurance] [to the company] [insurance] [ofclaim] [file] [be able to]

  14. Alignment: Consistency Near Far

  15. For each pair of candidates ai and aj calculate the J-side distance dJ and the E-side distance dE • Give a consistency score to the pair based on dJ and dE • Calculate consistency scores for all the pairs in a possible set of alignment candidates

  16. Baseline Distance of Each Branch: 1 Consistency Score: … … … 1/1+1/2=1.5

  17. Consistency Score • The frequency of distance pair in gold-standard alignment data (Mainichi newspaper 40K sentence pairs) [Uchimoto04] Frequency (log) Dist of J-Side Dist of E-Side

  18. 3 デ格 NP 日本 で you 1 文節内 保険 will have to file 3 1 連用 NN 会社 に 対して insurance 3 1 NP 文節内 保険 an claim 1 2 NN ノ格 請求 の insurance 3 3 ガ格 PP 申し立て が with the office 3 PP 可能です よ in Japan Distance based on Dependency Type [in Japan] [insurance] [to the company] [insurance] [ofclaim] [file] [be able to]

  19. 3 デ格 NP 日本 で you 1 文節内 保険 will have to file 3 1 連用 NN 会社 に 対して insurance 3 1 NP 文節内 保険 an claim 1 2 NN ノ格 請求 の insurance 3 3 ガ格 PP 申し立て が with the office 3 PP 可能です よ in Japan Distance based on Dependency Type [in Japan] [insurance] [to the company] [insurance] [ofclaim] [file] [be able to]

  20. デ格 NP 日本 で you 文節内 保険 will have to file 連用 NN 会社 に 対して insurance NP 文節内 保険 an claim NN ノ格 請求 の insurance ガ格 PP 申し立て が with the office PP 可能です よ in Japan Distance based on Dependency Type 3 3 [in Japan] 1 [insurance] 3 1 [to the company] 3 1 [insurance] 1 2 [ofclaim] 3 3 [file] 3 [be able to]

  21. Example of Alignment Improvement Proposed model Word-base alignment

  22. Translation

  23. Translation Translation Examples Input: 図書館で政治の本を読む。 I 図書館 で read 新聞 を I a newspaper 読む 図書館 で read in the library in library 政治 の a book in politics 本 を in politics book ACC 読む in the library read a book 政治 の in politics 本 が was left 売れ残って いる Output: I read a book in politics in the library on the shelf ・・・・・           ・・・・・

  24. Selection of Translation Examples • Score for an example • Size of an example • Similarity of neighboring nodes • Translation probability • Beam search from the root of the input [Sato 91]

  25. I read a newspaper Translation example: in the library Input: I 図書館 で read 新聞 を 図書館 で in library a newspaper 政治 の 読む in politics in the library 本 を book ACC 読む 0.7 I read study a newspaper in the library

  26. Combination of TMs Translation Examples Input: 図書館で政治の本を読む。 I 図書館 で read 新聞 を I a newspaper 読む 図書館 で read in the library in library 政治 の a book in politics 本 を in politics book ACC 読む in the library read a book 政治 の in politics 本 が was left 売れ残って いる on the shelf ・・・・・           ・・・・・

  27. Input:記録領域での変形形状と,記録特性の関係を調べた。Input:記録領域での変形形状と,記録特性の関係を調べた。 Output Dependency Tree Translation Examples Input Dependency Tree ┌ 状況 を 調べた 。 ┌the situation was examined ┌ the relationship ││  ┌ deformation ││┌ shape and │││ │ ┌ recording │││ └ in the region ││├ recording │└ between characteristics was examined ┌ 相互   ┌作用 と   │┌記録   ├特性 の ┌ 関係 を 調べた 。       ┌ 記録     ┌ 領域 で の     ├ 変形   ┌ 形状 と ,   │ ┌ 記録   ├ 特性 の ┌ 関係 を 調べた 。 ┌ the relationship ││┌interaction and ││├ recording │└ between characteristics was investigated   ┌cross-sectional ┌ shape ││  ┌ large ││┌deformation │└ in the region was └ simulated ┌大変     ┌形   ┌領域 で の   ├断面 ┌形状 を 模擬 した Output: The relationship between deformation shape in the recording region and recording characteristics was examined . ┌ 記録 領域 の ┌ recording of the areas ┌ 変形 パターン を ┌ deformation the pattern

  28. Evaluation ResultsandDiscussion

  29. Intrinsic J-EEvaluation Result

  30. Intrinsic E-JEvaluation Result

  31. Critical Defect in EJ Translation • Not caring whether a child node is a pre-child or post-child • Resulting target structure goes wrong • After resolving this defect, BLEU score in EJ translation rose to 24.02 from 22.65 ? ? ? 24.02

  32. Conclusion • Kyoto-U Fully Syntactic EBMT system: • Alignment: Consistency • Alignment: Extension • Translation: Discontinuous example • Translation: Easy combination • By using syntactic information, we could achieve reasonably high quality translation • For patent translation, we may need some pre-processings to handle special expressions which cause parsing errors

More Related