1 / 24

Globalization Features in Whidbey’s CLR

Globalization Features in Whidbey’s CLR. Michael Kaplan Technical Lead Globalization Infrastructure, Fonts and Tools Microsoft Windows International Division http://blogs.msdn.com/michkap. Customized Cultures and Regions. CultureAndRegionInfoBuilder class

umika
Download Presentation

Globalization Features in Whidbey’s CLR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Globalization Features in Whidbey’s CLR Michael Kaplan Technical Lead Globalization Infrastructure, Fonts and Tools Microsoft Windows International Division http://blogs.msdn.com/michkap April 25, 2005

  2. Customized Cultures and Regions • CultureAndRegionInfoBuilder class • Create an override to an existing culture • Create based on an existing culture • Create from scratch • Must be an administrator to register • Can register the file on multiple machines April 25, 2005

  3. CultureAndRegionInfoBuilder sample CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(“de-DE-MineMine”, CultureAndRegionModifiers.None); // load up all of the existing data for German and for Germany.... carib.LoadDataFromCultureInfo(new CultureInfo(“de-DE", false)); carib.LoadDataFromRegionInfo(new RegionInfo(“de”); // Change a property carib.ThreeLetterISORegionName = “ZZZ”; // Register the culture on the machine carib.Register(); // Use the new culture CultureInfo ci = new CultureInfo(“de-DE-MineMine”); April 25, 2005

  4. CaRIB serialization with LDML • Locale Data Markup Language • Described in UTS#35 at http://unicode.org/reports/tr35/ • CaRIB objects can be saved as LDML files • Data can be loaded from LDML files • CaRIB will do its best with files it did not create April 25, 2005

  5. LDML Sample string file1 = Path.GetTempFileName(); File.Delete(file1); CultureInfo ci = new CultureInfo("ar-EG"); RegionInfo ri = new RegionInfo("de-DE"); CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder("x-en-US-Pepsi", CultureAndRegionModifiers.None); carib.LoadDataFromCultureInfo(ci); carib.LoadDataFromRegionInfo(ri); carib.Save(file1); carib = CultureAndRegionInfoBuilder.CreateFromLdml(file1); carib.Register(); April 25, 2005

  6. When Windows knows more than .NET • As of XPSP2, there are 25 new locales in Windows: • Bengali - India • Croatian - Bosnia and Herzegovina • Bosnian - Bosnia and Herzegovina • Serbian - Bosnia and Herzegovina (Latin) • Serbian - Bosnia and Herzegovina (Cyrillic) • Welsh - United Kingdom (more info in English, in Welsh) • Maori - New Zealand • Malayalam - India • Maltese - Malta • Quechua - Bolivia • Quechua - Ecuador • Quechua - Peru • Setswana / Tswana - South Africa • isiXhosa / Xhosa - South Africa • isiZulu / Zulu - South Africa • Sesotho sa Leboa / Northern Sotho - South Africa • Northern Sami - Norway • Northern Sami - Sweden • Northern Sami - Finland • Lule Sami - Norway • Lule Sami - Sweden • Southern Sami - Norway • Southern Sami - Sweden • Skolt Sami - Finland • Inari Sami - Finland • There will be more in future service packs • In Longhorn, there will be 75 or more new locales April 25, 2005

  7. Windows-only Cultures • The solution: Windows-only cultures! • Synthesizes a CultureInfo object when Windows supports a locale that the .NET Framework does not know how to create itself April 25, 2005

  8. Windows only culture test foreach(CultureInfo culture in CultureInfo.GetCultures(CultureTypes.WindowsOnlyCultures)) { Console.WriteLine(ci.Name); } // New cultures on XP SP2 include: // mt-MT, bs-BA-Latn, smn-FI, smj-NO, smj-SE, sms-FI, sma-NO, // sma-SE, quz-BO, quz-EC, quz-PE, ml-IN, bn-IN, cy-GB, and more April 25, 2005

  9. Special CultureInfo support for SQL Server 2005 (Yukon) • SQL Server locale semantics: • One setting for UI and formatting • Another setting for collation/encoding • .NET/Windows semantics • One setting for UI • Another setting for formatting/collation • Solution • Special GetCultureInfo override that takes two CultureInfo names for the two SQL Server settings April 25, 2005

  10. How Yukon uses this support • Microsoft.ReportingServices.Diagnostics.Localization • CatalogCulture • ClientPrimaryCulture • DefaultReportServerCulture • FallbackUICulture • InstalledCultureNames • ReportParameterCulture • SqlCulture April 25, 2005

  11. New locale properties/methods • TextInfo • CultureName • LCID • CompareInfo • Name • DateTimeFormatInfo • ShortestDayNames • MonthGenitiveNames • AbbreviatedMonthGenitiveNames • NumberFormatInfo • NativeDigits • DigitSubstitution • CultureInfo • IsCustomCulture • IetfLanguageTag • CultureTypes • GetCultureInfo() • GetCultureInfoByIetfLanguageTag() • RegionInfo • GeoId • NativeName • CurrencyEnglishName • (Can now create via full culture names) April 25, 2005

  12. Updates to encodings • Now built into the BCL • Improved performance • more flexibility • consistent results across supported platforms • Encoding enumeration API • UTF-32 support (little endian and big endian) • UTF-16 big endian support • Encoding/decoding fallbacks • Exception • Replacement • “Best fit” • Custom April 25, 2005

  13. public class NumericEntitiesFallback : EncoderFallback { public override EncoderFallbackBuffer CreateFallbackBuffer() { return new NEFallbackBuffer(); } public override int MaxCharCount { get { return 8; } } } public class NEFallbackBuffer : EncoderFallbackBuffer { // Store our default string private String strEntity; int fallbackCount = -1; int fallbackIndex = 0; // Fallback Methods public override bool Fallback(char charUnknown, int index) { // If we had a buffer already we're being recursive, throw, // it's probably at the suspect character in our array. if (fallbackCount >= 0) ThrowLastCharRecursive(unchecked((int)charUnknown)); // Go ahead and get our fallback strEntity = String.Format("&#{0};", (int)charUnknown); fallbackCount = strEntity.Length; fallbackIndex = 0; return fallbackCount != 0; } public override bool Fallback(char charUnknownHigh, char charUnknownLow, int index) { // Double check input surrogate pair if (!Char.IsHighSurrogate(charUnknownHigh)) throw new ArgumentOutOfRangeException("charUnknownHigh", “supposed to be between 0xD800 and 0xDBFF"); if (!Char.IsLowSurrogate(charUnknownLow)) throw new ArgumentOutOfRangeException("CharUnknownLow", “supposed to be between 0xD800 and 0xDBFF"); // If we had a buffer already we're being recursive, throw, it's // probably at the suspect character in our array. if (fallbackCount >= 0) ThrowLastCharRecursive(Char.ConvertToUtf32(charUnknownHigh, charUnknownLow)); // Go ahead and get our fallback strEntity = String.Format("&#{0};", Char.ConvertToUtf32(charUnknownHigh, charUnknownLow)); fallbackCount = strEntity.Length; fallbackIndex = 0; return fallbackCount != 0; } public override char GetNextChar() { // We want it to get < 0 because == 0 means that the current/last // character is a fallback and we need to detect recursion. We // could have a flag but we already have this counter. fallbackCount--; // Do we have anything left? 0 is now last fallback char, negative // is nothing left if (fallbackCount < 0) return (char)0; // Need to get it out of the buffer. return strEntity[fallbackIndex++]; } public override bool MovePrevious() { fallbackCount++; fallbackIndex--; return true; } public override int Remaining { get { return (fallbackCount < 0) ? 0 : fallbackCount; } } // private helper methods private void ThrowLastCharRecursive(int charRecursive) { // Throw it, using our complete character throw new ArgumentException( String.Format("Last character \\u{0:4X} was a recursive fallback", charRecursive), "chars"); } } April 25, 2005

  14. Collation Improvements • OrdinalIgnoreCase • Same results as ToUpper/Ordinal • Matches OS file system results • Correct Serbian collation • Fixed in Windows XPSP2 • Customer reported (MSDN Feedback Center) • Better handling of ignored/ignorable characters • IndexOf/LastIndexOf/IsPrefix/IsSuffix • StartsWith/EndsWith, too April 25, 2005

  15. OrdinalIgnoreCase sample string strTest1 = "IamAString"; string strTest2 = "STRING"; if(strTest1.EndsWith(strTest2, StringComparison.OrdinalIgnoreCase)) { Console.WriteLine(“Successful test!”); }; April 25, 2005

  16. Unicode normalization • Described in UAX#15 at http://www.unicode.org/reports/tr15/ • String.IsNormalized()String.IsNormalized(NormalizationForm normalizationForm) • String.Normalize()String.Normalize(NormalizationForm normalizationForm) • NormalizationForm enumeration • FormC, FormD, FormKC, FormKD • õĥµ¨(U+00f5 U+0068 U+0302 U+00b5 U+00a8)LATIN SMALL LETTER O WITH TILDE; LATIN SMALL LETTER H; COMBINING CIRCUMFLEX ACCENT; MICRO SIGN; DIAERESIS • FormC: õĥµ¨(U+00f5 U+0125 U+00b5 U+00a8) • FormD: õĥµ¨(U+006f U+0303 U+0068 U+0302 U+00b5 U+00a8) • FormKC: õĥμ ̈ (U+00f5 U+0125 U+03bc U+0020 U+0308) • FormKD: õĥμ ̈ (U+006f U+0303 U+0068 U+0302 U+03bc U+0020 U+0308) • In collation, õĥµ¨ ≅ õĥµ¨≅õĥμ ̈  ≅ õĥμ ̈  April 25, 2005

  17. namespace àáâãäå { using System; using System.Text; using System.Globalization; class àáâãäå { [STAThread] static void Main(string[] args) { àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); } static void àáâãäå(string àáâãäå) { StringBuilder àáâãäå = new StringBuilder(); StringInfo àáâãäå = new StringInfo(àáâãäå); àáâãäå.Append(àáâãäå.Normalize(NormalizationForm.FormC)); àáâãäå.Append(": "); for(int àáâãäå=0; àáâãäå < àáâãäå.LengthInTextElements; àáâãäå++) { string àáâãäå = àáâãäå.SubstringByTextElements(àáâãäå, 1); if(àáâãäå.IsNormalized(NormalizationForm.FormC)) { àáâãäå.Append("C"); } else if(àáâãäå.IsNormalized(NormalizationForm.FormD)) { àáâãäå.Append("D"); } else { àáâãäå.Append("_"); } } Console.WriteLine(àáâãäå.ToString()); return; } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } } } April 25, 2005

  18. IDN Mapping APIs • IdnMapping class • Based on three RFCs (standard based on Unicode 3.2) • 3490 - Internationalizing Domain Names in Applications (IDNA) • 3491 - Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN) • 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA) • \u5B89\u5BA4\u5948\u7F8E\u6075-with-SUPER-MONKEYS becomesxn---with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n • Properties • AllowUnassigned (allows new Unicode characters) • UseStd3AsciiRules (more like DNS rules) • Methods • GetAscii - Gets ASCII (Punycode) version of the string • GetUnicode - Gets Unicode version of the string, normalized and limited to IDNA characters. April 25, 2005

  19. Unicode property information • New CharUnicodeInfo class • Extends methods on Char • Offical data from the Unicode Character Database at http://www.unicode.org/ucd/ • IsWhiteSpace • GetNumericValue • GetDigitValue • GetDecimalDigitValue • GetUnicodeCategory • GetBidiCategory April 25, 2005

  20. New text element support in the StringInfo class • StringInfo ctor that takes a string • StringInfo.String • StringInfo.LengthInTextElements • StringInfo.SubstringByTextElements() • Both use ParseCombiningCharacters() to get their results April 25, 2005

  21. New StringInfo props/methods sample StringInfo si = New StringInfo("A\u0300\u0301\u0300e\u0300\u0301\u0300“); Console.WriteLine(si.LengthInTextElements); // Length is two for(int ich = 0; ich < si.LengthInTextElements; ich++) { Console.WriteLine(si.SubstringByTextElements(ich, 1); } April 25, 2005

  22. New supplementary character support in lots of methods • New signature -- (String s, int index) • IsControl, IsDigit, IsLetter, IsLetterOrDigit, IsLower, IsNumber, IsPunctuation, IsSeparator, IsSurrogate, IsSymbol, IsUpper, IsWhiteSpace, GetUnicodeCategory, GetNumericValue, IsHighSurrogate, IsLowSurrogate, IsSurrogatePair • ConvertToUtf32, ConvertFromUtf32 methods April 25, 2005

  23. References • MSDN Magazine Article • Make the .NET World a Friendlier Place with the Many Faces of the CultureInfo ClassMarch 2005 - http://msdn.microsoft.com/msdnmag/issues/05/03/CultureInfo/ • SQL Server Books Online “International Considerations for SQL Server”http://whidbey.msdn.microsoft.com/library/en-us/icsql9/html/50dc4fa8-4772-46a8-a8ef-bc134502b4e0.asp • My Blog • http://blogs.msdn.com/michkap • Some other blogs for int’l support in Whidbey • http://blogs.msdn.com/AchimR • http://www.dasblonde.net/ • http://blogs.msdn.com/BCLTeam • Other useful sites • http://www.microsoft.com/globaldev/ • http://lab.msdn.microsoft.com/productfeedback/ • http://www.unicode.org/ April 25, 2005

  24. Globalization Features in Whidbey’s CLRQuestions April 25, 2005

More Related