260 likes | 290 Views
International Domain Name. TWNIC Nai-Wen Hsu snw@twnic.net.tw. Domain name. RFC 1035 A label can not longer than 63 characters A domain name can not longer than 255 characters Maximum labels: 127 Only accept a-z,0-9, ’ - ’ as domain name
E N D
International Domain Name TWNIC Nai-Wen Hsu snw@twnic.net.tw
Domain name • RFC 1035 • A label can not longer than 63 characters • A domain name can not longer than 255 characters • Maximum labels: 127 • Only accept a-z,0-9,’-’ as domain name • Limited ASCII character code point, 37 LDH (Letter-Digit-Hyphen)
International Domain Name • IETF IDN WG adopt UNICODE 3.2 • Greek, Cyrillic, Armenian, Hebrew, Arabic,Syriac, Thaana, Devanagari, Bengali,Gurmukhi, Gujarati, Oriya, Tamil, Telugu,Kannada, Malayalam, Sinhala, Thai, … • 95,156 characters
International Domain Name sample • レコード会社.jp • gwmöbler.com • 慎昌鐘錶.tw • 阿克苏诺贝尔油漆公司.cn • 소프트웨어.kr • לארשי . םוק
IETF IDN Standard • IDNA (RFC3490) • Internationalizing Domain Names in Applications • NAMEPREP(RFC3491) • A Stringprep Profile for Internationalized Domain Names • PUNYCODE(RFC3492) • A Bootstring encoding of Unicode for Internationalized Domain Names in Applications • STRINGPREP(RFC3454) • Preparation of Internationalized Strings
IDNA components and interfaces User Input and display: local interface methods (pen, keyboard, ...) IDNA IDNA-aware Application (ToASCII and ToUnicode operations may be called here) End system Call to resolver ACE Application-specific Protocol: ACE Unless the protocol Is updated to handle Other encodings xn--de-jg4avhby1noc0d Resolver DNS Protocol ACE "Application" is where the application splits a host name into labels, sets the appropriate flags, and performs the ToASCII and ToUnicode operations. DNS Servers Application Servers
IDNA Structure Nameprep: A Stringprep Profile for Internationalized Domain Names User input (UNICODE) IDNA • NAMEPREP • Mapping • Normalization • Prohibit STRINGPREP ToASCII ToUnicode ACE(PUNYCODE) ACE To resolver
NAMEPREP • A Stringprep Profile for Internationalized Domain Names • Mapping • Stringprep table B.1,B.2 • Normalization • Form KC • Prohibited Output • Stringprep table C.1.2,2.2,3,4,5,6,7,8,9
NAMEPREP -- Mapping • Commonly mapped to nothing: 27 • Ex: • Mapping for case-folding used with NFKC: 1371 • Ex:A a (U+0041U+0061) (U+03ABU+03CB) (U+3371U+0068 U+0070 U+0061)
NAMEPREP -- Normalization • Unicode normalization with form KC
NAMEPREP -- Normalization • ‘u’+‘‥’ ‘ü’ • ‘a’‘a’
NAMEPREP – Prohibited output • Non-ASCII space characters: 17 • Ex: (NO-BREAK SPACE) • Non-ASCII control characters: 54 • Ex: (DEVICE CONTROL STRING) • Private use: 133371 • Non-character code points: 49 • Surrogate codes: 2048
NAMEPREP – Prohibited output • Inappropriate for plain text: 4 • Inappropriate for canonical representation: 12 • Change display properties or are deprecated: 13 • Tagging characters: 97
PUNYCODE • A Bootstring encoding of Unicode for IDNA • One of the ACE(ASCII Compatible Encoding) • Translate non-ASCII characters to ASCII characters • Prefix: xn-- • Ex:慎昌鐘錶.tw xn--ciun9hb52c2za.tw
Insufficient in IDN standard • Current IDN standard (IDNA, NAMEPREP, PUNYCODE) can not solve Chinese domain name requirement • Tradition/Simplify Chinese mapping • Ex: 台 臺 • Writing variant mapping • Ex: 峰 峯
Insufficient in IDN standard • They are the same meaning but it is different character in different countries • In China: • 劝(529D) • In Japan: • 勧(52E7) • In Taiwan: • 勸(52F8)
IDN administration guide line • Registration policy to solve those problems listed above • Every language has a variant table with 3 fields: • valid code point • recommended variant • character variant
Variant Table • Singular-relation character (VCP=twRV=cnRV=CV): 13888(66.4%) • VCP=twRV≠cnRV: 2783 (13.3%) • VCP=cnRV≠twRV: 2453(11.7%) • VCP≠(twRV=cnRV): 333(1.6%) • VCP≠twRV≠SCR: 387(1.9%)
Variant Table • The table draft is prepared by the CCMT Task force • organized by TWNIC from January, 2002. • Task force members have 9 experts from • language linguist, computer experts and DNS experts. • The table draft has submitted to the Bureau of Standards, • Ministry of Economic Affairs to final review.
Registration procedure • A Registrant should select the language(s) • Activation of the requested domain name(s) & Reservation of the equivalence(s) should be provided by the Registry, within the language-based character set • The registrant can require the activation of the reserved equivalent domain name(s) at any time
Registration sample • A user select zh-tw and zh-cn language with domain name 丁上萬.com • 丁上萬.com (Recommended variants for zh-tw) • 丁上万.com (Recommended variants for zh-cn) • 丁丄万.com (Character Variant) • 丁丄萬.com (Character Variant)