10 likes | 183 Views
Japanese-Chinese Phrase Alignment Using Common Chinese Characters Information Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics, Kyoto University. Kanji: 開発 (develop). Category 3 Kanji: 発. Category 2 Kanji: 開. 発→ 發 ・・・ Unihan database. 開→ 开 發 → 发
E N D
Japanese-Chinese Phrase Alignment Using Common Chinese Characters InformationChenhui Chu, Toshiaki Nakazawa and SadaoKurohashi Graduate School of Informatics, Kyoto University Kanji: 開発(develop) Category 3 Kanji: 発 Category 2 Kanji:開 発→發 ・・・ Unihan database 開→开 發→发 ・・・ Category 2 Kanji:發 Introduction Alignment Model Simplified Chinese: 开发 • Common Chinese characters information may be valuable in word/phrase alignment between Japanese and Chinese • Chinese characters are used both in Japanese (Kanji) and Chinese (Hanzi) • There exist common Chinese characters between Kanji and Hanzi • Parallel sentences contain equivalent meanings in each language, and we can assume common Chinese characters appear in the sentences • Bayesian subtree alignment model on dependency trees (Nakazawa et al. 2011) (1) (2) (3) (4) (5) • Three categories of Kanji: • Category 1: identical to Simplified Chinese • Category 2: identical to Traditional Chinese but different from Simplified Chinese • Category 3: visual variations • Common Chinese characters information incorporation • Base distribution adjustment (6) (7) • Model modification (8) Common Chinese Characters Detection Experiments • Aiming to detect common Chinese characters between Japanese and Simplified Chinese, we do a conversion of Japanese into Chinese • Freely available resources used for category 2 and 3 Kanji conversion: • Japanese-Chinese corpus we used • Coverage of common Chinese characters detection • Example of common Chinese characters detection • Alignment • We also do Kana-Kanji conversion for common Chinese characters detection