130 likes | 247 Views
漢字完全摺疊於國際化網域名稱系統. Complete Folding of Han Ideographs in Internationalized Domain Name System 吳培基 國立澎湖技術學院資訊工程科 澎湖縣 880 馬公市六合路 300 號 Email: pcwu@npit.edu.tw http://www.npit.edu.tw/~pcwu. 簡介 (1). The domain name system (DNS) provides a mechanism for naming resources in Internet.
E N D
漢字完全摺疊於國際化網域名稱系統 Complete Folding of Han Ideographs in Internationalized Domain Name System 吳培基 國立澎湖技術學院資訊工程科 澎湖縣880馬公市六合路300號Email: pcwu@npit.edu.tw http://www.npit.edu.tw/~pcwu
簡介 (1) • The domain name system (DNS) provides a mechanism for naming resources in Internet. • The domain name space is a tree where each node has a label. • DNS assumes the use of ASCII. • Internet Engineering Task Force (IETF) has set up a working group on Internationalized Domain Name System (IDN). • Use of Universal Character Set (UCS) in IDN is promising to simplify several implementation problems. • There are already various experimental implementations of IDN, such as iDNS, i-DNS.net, and c-DN.
簡介 (2) • Han ideographs originated from China in thousands of years. • Their usage was later spread to other countries, such as Korea, Japan and Vietnam. • Discussion on the IDN mailing list has raised the issue of Han folding: • Folding Han ideographs that are treated to be equivalent during domain name comparisons. • For example, almost all simplified Chinese characters have equivalent traditional characters. • There are still no well-established rules for Han folding.
相關工作 (1) • UCS and Unicode adopt compatible or equivalent characters. • The resulting compatibility or equivalence issue of domain names can be partly solved by Unicode’s definitions of four normalized forms. • Seng and Huang analyze alternatives for Han folding. • DO NOT recommend using Han folding based on mapping traditional Chinese to simplified Chinese, because the folding introduces “side effect”. • Using the ZVariant property in the Unihan database, which describes the variant forms of a character in traditional Chinese.
相關工作 (2) • The DNS extension called non-terminal DNS name redirection defines a new DNAME Resource Record (RR). • The following defines two subdomain aliases 台灣 and 臺灣 for domain tw: 台灣 DNAME tw 臺灣 DNAME tw
簡繁對照: 一對一 • 簡體 繁體 簡體 繁體 厂 廠 对 對 气 氣 寿 壽 宁 寧 将 將 网 網 战 戰 从 從 担 擔 刘 劉 数 數 剧 劇 断 斷 园 園 无 無 国 國 旧 舊
簡繁對照: 一對多 • 簡體 繁體 語意 了 瞭, 了 了解, 瞭解 卜 蔔, 卜 蘿蔔, 占卜 干 乾, 幹, 干 乾淨, 幹練, 干戈 丰 豐, 丰 豐富, 丰采 云 雲, 云 浮雲, 不知所云 升 昇, 升 昇華, 升斗 斗 鬥, 斗 戰鬥, 升斗 只 衹, 隻, 只 衹有, 隻身, 只有 台 臺, 檯, 颱, 台 臺灣, 檯燈, 颱風, 兄台
簡繁對照: 多對一 • 簡體 繁體 語意 了(liao3), 瞭(liao4) 瞭 明瞭, 瞭望 干(gan1), 乾(qian2) 乾 乾淨, 乾坤
漢字完全摺疊 • 與地域無關:在一國家等同的漢字可與在其他國家等同的漢字相摺疊。 • 此一摺疊含括所有在任一國家廣為認可的等同關係。在網域名稱的比對時,所有這些等同的漢字是摺疊的。 • 摺疊等級(folding level): • 1: 一對一 • 2: 一對多 • 3: 多對一, 情況較少, 不予考慮。
常用詞彙所需摺疊等級 folding level 大學, 大学 1 中國, 中国 1 氣象, 气象 1 陰陽, 阳阴 1 昇平, 升平 2 後花園, 后花园 2 鬥牛, 斗牛 2 雲門, 云門 2 臺灣, 台灣, 臺湾, 台湾 2
應用及問題 • 不需定義以下子網域別名來作簡繁對照: 台湾 DNAME 台灣 交通大学 DNAME 交通大學 • 語意模稜兩可
實作 Table 6: Lookup table for Han folding. • 字元 UCS-4 code 對應字元 台 53F0 53F0 檯 6AAF 53F0 湾 6E7E 6E7E 灣 7063 6E7E 臺 81FA 53F0 颱 98B1 53F0
結論及未來工作 • We have addressed the complete folding of Han ideographs in IDN. • The folding is locale-independent. • The folding is complete: all equivalent relations that are well accepted in one country are included in the folding. • We have addressed and analyzed issues such as the side effect and ambiguity in Han folding. • The future work includes the construction of lookup table for Han folding in IDN.