280 likes | 425 Views
Standardization of Internationalized Domain Name at IETF. 24 Jan 2002 Yoshiro YONEYA <yone@nic.ad.jp> JPNIC. What is IDN?. I nternationalized D omain N ame. Current domain name is represented with ASCII alpha-numeric and hyphen characters.
E N D
Standardization of Internationalized Domain Name at IETF 24 Jan 2002 Yoshiro YONEYA <yone@nic.ad.jp> JPNIC
What is IDN? • Internationalized Domain Name. • Current domain name is represented with ASCII alpha-numeric and hyphen characters. • IDN is a technical challenge to represent domain name with not only ASCII but also NON-ASCII characters. APAN2002 Conference
What is Internationalization? • Framework to extend character repertoire for domain names. • Need to be a Global Standard not to lose global communication. • IETF IDN (Internationalized Domain Name) WG is doing the work. • Some confusion by using the word ‘Multilingualization’. • Character is just one of a component of languages. • Multilingual domain name is a service level’s aspect. APAN2002 Conference
Internationalized Domain Names 华人.公司.cn 華人.商業.tw 高島屋.会社.jp 삼성.회사.kr 三星.회사.kr الاهرام.م viagénie.qc.ca ישראל.קום ทีเอชนิค.พาณิชย์.ไทย 現代.com ヤフー.com http://www.jdna.jp/activities/event/jdn-tutorial/IDNSDK.pdf APAN2002 Conference
Why IDN? • Increases of the Internet users who are not familiar with English. • Easy to memorize, type in, etc. • Drastic changes of usage of domain name. • Domain name is now used as not only host name but also signboard. • Creates new business opportunities. • Many ventures began services. APAN2002 Conference
Drawback of IDN • Loses global acceptability at end-user interface. • Hard to type in or display NON-ASCII characters without appropriate I/O devices and / or softwares. • Cause impact to the operation. • Requires software update and / or additional processing. • Deployment issue. APAN2002 Conference
History of IDN WG • Established on Jan 2000. • Mainly discussion is done on mailing list. • Had 1st meeting at 47th IETF at Adelaide. • From then, having meeting every IETF. • Decided WG’s solution at last (52nd) IETF. • IDNA, NAMEPREP and Punycode (formerly known as AMC-ACE-Z). • Waiting for WG last call. APAN2002 Conference
Scope and priority of IDN WG • Provide standard. • Not to divide the global connectivity and communication of the Internet. • Backward compatibility. • Compatibility with current DNS and application protocols to work with current Internet infrastructure. • No localization. • Independent from certain regions, countries and / or languages • Refer to existing universal standards • Common framework essential to internationalization APAN2002 Conference
IDNA(Internationalizing Domain Names In Applications)draft-ietf-idn-idna-06.txt • An architecture denotes how to process IDN. • Use Unicode which is upper compatible with ASCII as a character codeset. • Normalize internal representation of characters which has multiple code points such as upper/lower, full-width/half-width and composing characters, into a single representation not to fail matching. • Represent NON-ASCII characters which inputted or displayed at user interface as an ASCII Compatible Encoding (ACE) string on the Network. • Those processes be performed in application software. APAN2002 Conference
Important point of IDNA • Representation at the user interface layer and the network layer is different. • Though the same for ASCII domain names. • Application solution. • Least impact to the Internet infrastructure. APAN2002 Conference
To/From Unicode NAMEPREP To/From ACE Image of the IDNA Local User Application UI To/From Unicode NAMEPREP Internal Representation End system To/From ACE Resolver API Int’l DNS servers Application servers APAN2002 Conference
NAMEPREP(Stringprep Profile for Internationalized Host Names)draft-ietf-idn-nameprep-07.txt • Profile for STRINGPREP (Preparation of Internationalized Strings) • draft-hoffman-stringprep-00.txt • Some scripts such as alphabet have multiple representation for a character. • Domain name is case insensitive. • Normalization process to unify representation of strings that is the same in meaning or displaying into a single representation. • Case (upper / lower) • Compatible character (full / half width) • Composing character APAN2002 Conference
Important point of NAMEPREP • Normalize representation of Internationalized domain name string to match correctly. • ‘a’ vs ‘A’ • ‘u’+‘¨’ vs ‘ü’ • ‘ア’ vs ‘ア’ APAN2002 Conference
Processes in NAMEPREP • map • Case folding of upper/lower characters (UTR#21) • normalize • Normalize representation of string (UAX#15 NFKC) • prohibit • Check out inappropriate character as domain name. APAN2002 Conference
ACE(ASCII Compatible Encoding) • Represent NON-ASCII characters by ASCII characters. • Easy to apply current DNS. • Least impact to current applications. • Decreases maximum characters in each label. • Penalty of using only 5bit to represent 8bit data. • Requires some sort of compression algorithm. APAN2002 Conference
ACE Identifier • Requires explicit ACE-identifier. • For reverse conversion. • Choice of ACE-ID is political issue. • ACE-ID itself is ASCII string, so that if any proposal for ACE-ID is raised, it will be registered as ASCII domain name. • Actually happened at gTLD. • IANA will assign the ACE-ID. APAN2002 Conference
Criteria of ACE selection • Simple algorithm. • For ease implementation. • Interoperability. • Effective compression results for practical IDNs. • To accommodate characters as much as possible. • bilateral corresponding between encoding and decoding. • To avoid existence of alternative encoded representation for one IDN. • Security consideration. APAN2002 Conference
Comparison of ACE proposals Encoding sample of ‘日本語ドメイン名試験.JP’ Evaluation result from existing Japanese JP domain names APAN2002 Conference
Punycodedraft-ietf-idn-punycode-00.txt • Selected ACE of IDN WG. • Compression algorithm. • Extract characters by ascending order of codepoint. • Encode difference of codepoint from previously processed character’s and the position into an integer. • Extract Letters, Digits and Hyphen as bootstring. • ASCII conversion algorithm. • Introduced new concept named ‘Generalized variable-length integers’. • BASE36 (A-Z, 0-9). APAN2002 Conference
Compression process of Punycode(simplified for understanding) • “文字列例” • Compression. • 1:U+6587 2:U+5B57 3:U+5217 4:U+4F8B • 4:0x4F8B 3:0x28C 2:0x440 1:0xA30 • 0x13E30 0xA33 0x1102 0x28C1 sort, diff To integer (diff*chars+ position) APAN2002 Conference
Generalized variable-length integers of Punycode • 12345 in decimal is represented as 1*10^4+2*10^3+3*10^2+4*10^1+5*10^0 • Digits in all place are 0-9, so components in sequential 12345 cannot distinguish 123 and 45 or 1234 and 5. • Furthermore, 012345 and 12345 are the same value with different representation. • GVLI (Generalized variable-length integers) is an idea to solve this problem. • Defines threshold for each place, and recognize a number below the threshold is delimiter. • Threshold is an appropriate number smaller than base number. APAN2002 Conference
Encoding process of Punycode (simplified for understanding) • Assign A-Z0-9 to GVLI. • Assume 36 for base, 10, 18, 25, 25 for thresholds. • 0x13E30 0xA33 0x1102 0x28C1 • OIUD • BS4 • CN8 • XML • “文字列例”=>“OUIDBS4CN8XML” . • Real Punycode generates “FSQW5D78MBSK”. 24*1+18*26(=1*(36-10))+30*468(=26*(36-18))+13*5148(=468*(36-25)) 11*1+28*26+4*468 12*1+23*26+8*468 33*1+22*26+21*468 APAN2002 Conference
Standardization of IDN is just the start point of utilization • End users uses IDN with application softwares. • Web, Mail, etc. • IDNA requires application’s correspondence. • Must define how to deal IDNs in application protocols. Standardization of IDN does not mean ready to use. Just a start point for applications incorporating new features. APAN2002 Conference
GET http://ジェーピーニック.JP/ HTTP/1.1 Host: ジェーピーニック.JP Referer: http://ジェーピーニック.JP/ ZQ--HCKQZ9BZB1CYRB.JP Web server’s IP adress Error! HTTP Request(DNS resolve only) Web DNS User http://ジェーピーニック.JP/ APAN2002 Conference
GET http://ZQ--HCKQZ9BZB1CYRB.JP/ HTTP/1.1 Host: ZQ--HCKQZ9BZB1CYRB.JP Referer: http://ZQ--HCKQZ9BZB1CYRB.JP/ ZQ--HCKQZ9BZB1CYRB.JP Web server’s IP address Contents HTTP Request(ACE in HTTP header) Web DNS User http://ジェーピーニック.JP/ APAN2002 Conference
References • IETF IDN WG Web page • http://www.i-d-n.net/ • Unicode Consortium • http://www.unicode.org/ APAN2002 Conference
Acknowledgement • Telecommunications Advancement Organization of Japan (TAO). • JPNIC’s research activity of security investigation of IDN is a part of TAO’s research. • http://www.shiba.tao.go.jp/ APAN2002 Conference
IDN Compliant clients & implementations • Mozilla http://playground.i-dns.net/mozilla/index.html • Plug-in to Mozilla, resolution using RACE • Opera http://www.opera.com/ • Native, Resolution using RACE • Internet Explorer 5 or higher http://www.microsoft.com/windows/ie/default.asp • Uses keyword search engine as RACE converter • mDNkit http://www.nic.ad.jp/jp/research/idn/mdnkit/download/ • Opensource toolkit for developing IDN compliant softwares APAN2002 Conference