240 likes | 460 Views
Globalization Features in Whidbey’s CLR. Michael Kaplan Technical Lead Globalization Infrastructure, Fonts and Tools Microsoft Windows International Division http://blogs.msdn.com/michkap. Customized Cultures and Regions. CultureAndRegionInfoBuilder class
E N D
Globalization Features in Whidbey’s CLR Michael Kaplan Technical Lead Globalization Infrastructure, Fonts and Tools Microsoft Windows International Division http://blogs.msdn.com/michkap April 25, 2005
Customized Cultures and Regions • CultureAndRegionInfoBuilder class • Create an override to an existing culture • Create based on an existing culture • Create from scratch • Must be an administrator to register • Can register the file on multiple machines April 25, 2005
CultureAndRegionInfoBuilder sample CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(“de-DE-MineMine”, CultureAndRegionModifiers.None); // load up all of the existing data for German and for Germany.... carib.LoadDataFromCultureInfo(new CultureInfo(“de-DE", false)); carib.LoadDataFromRegionInfo(new RegionInfo(“de”); // Change a property carib.ThreeLetterISORegionName = “ZZZ”; // Register the culture on the machine carib.Register(); // Use the new culture CultureInfo ci = new CultureInfo(“de-DE-MineMine”); April 25, 2005
CaRIB serialization with LDML • Locale Data Markup Language • Described in UTS#35 at http://unicode.org/reports/tr35/ • CaRIB objects can be saved as LDML files • Data can be loaded from LDML files • CaRIB will do its best with files it did not create April 25, 2005
LDML Sample string file1 = Path.GetTempFileName(); File.Delete(file1); CultureInfo ci = new CultureInfo("ar-EG"); RegionInfo ri = new RegionInfo("de-DE"); CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder("x-en-US-Pepsi", CultureAndRegionModifiers.None); carib.LoadDataFromCultureInfo(ci); carib.LoadDataFromRegionInfo(ri); carib.Save(file1); carib = CultureAndRegionInfoBuilder.CreateFromLdml(file1); carib.Register(); April 25, 2005
When Windows knows more than .NET • As of XPSP2, there are 25 new locales in Windows: • Bengali - India • Croatian - Bosnia and Herzegovina • Bosnian - Bosnia and Herzegovina • Serbian - Bosnia and Herzegovina (Latin) • Serbian - Bosnia and Herzegovina (Cyrillic) • Welsh - United Kingdom (more info in English, in Welsh) • Maori - New Zealand • Malayalam - India • Maltese - Malta • Quechua - Bolivia • Quechua - Ecuador • Quechua - Peru • Setswana / Tswana - South Africa • isiXhosa / Xhosa - South Africa • isiZulu / Zulu - South Africa • Sesotho sa Leboa / Northern Sotho - South Africa • Northern Sami - Norway • Northern Sami - Sweden • Northern Sami - Finland • Lule Sami - Norway • Lule Sami - Sweden • Southern Sami - Norway • Southern Sami - Sweden • Skolt Sami - Finland • Inari Sami - Finland • There will be more in future service packs • In Longhorn, there will be 75 or more new locales April 25, 2005
Windows-only Cultures • The solution: Windows-only cultures! • Synthesizes a CultureInfo object when Windows supports a locale that the .NET Framework does not know how to create itself April 25, 2005
Windows only culture test foreach(CultureInfo culture in CultureInfo.GetCultures(CultureTypes.WindowsOnlyCultures)) { Console.WriteLine(ci.Name); } // New cultures on XP SP2 include: // mt-MT, bs-BA-Latn, smn-FI, smj-NO, smj-SE, sms-FI, sma-NO, // sma-SE, quz-BO, quz-EC, quz-PE, ml-IN, bn-IN, cy-GB, and more April 25, 2005
Special CultureInfo support for SQL Server 2005 (Yukon) • SQL Server locale semantics: • One setting for UI and formatting • Another setting for collation/encoding • .NET/Windows semantics • One setting for UI • Another setting for formatting/collation • Solution • Special GetCultureInfo override that takes two CultureInfo names for the two SQL Server settings April 25, 2005
How Yukon uses this support • Microsoft.ReportingServices.Diagnostics.Localization • CatalogCulture • ClientPrimaryCulture • DefaultReportServerCulture • FallbackUICulture • InstalledCultureNames • ReportParameterCulture • SqlCulture April 25, 2005
New locale properties/methods • TextInfo • CultureName • LCID • CompareInfo • Name • DateTimeFormatInfo • ShortestDayNames • MonthGenitiveNames • AbbreviatedMonthGenitiveNames • NumberFormatInfo • NativeDigits • DigitSubstitution • CultureInfo • IsCustomCulture • IetfLanguageTag • CultureTypes • GetCultureInfo() • GetCultureInfoByIetfLanguageTag() • RegionInfo • GeoId • NativeName • CurrencyEnglishName • (Can now create via full culture names) April 25, 2005
Updates to encodings • Now built into the BCL • Improved performance • more flexibility • consistent results across supported platforms • Encoding enumeration API • UTF-32 support (little endian and big endian) • UTF-16 big endian support • Encoding/decoding fallbacks • Exception • Replacement • “Best fit” • Custom April 25, 2005
public class NumericEntitiesFallback : EncoderFallback { public override EncoderFallbackBuffer CreateFallbackBuffer() { return new NEFallbackBuffer(); } public override int MaxCharCount { get { return 8; } } } public class NEFallbackBuffer : EncoderFallbackBuffer { // Store our default string private String strEntity; int fallbackCount = -1; int fallbackIndex = 0; // Fallback Methods public override bool Fallback(char charUnknown, int index) { // If we had a buffer already we're being recursive, throw, // it's probably at the suspect character in our array. if (fallbackCount >= 0) ThrowLastCharRecursive(unchecked((int)charUnknown)); // Go ahead and get our fallback strEntity = String.Format("&#{0};", (int)charUnknown); fallbackCount = strEntity.Length; fallbackIndex = 0; return fallbackCount != 0; } public override bool Fallback(char charUnknownHigh, char charUnknownLow, int index) { // Double check input surrogate pair if (!Char.IsHighSurrogate(charUnknownHigh)) throw new ArgumentOutOfRangeException("charUnknownHigh", “supposed to be between 0xD800 and 0xDBFF"); if (!Char.IsLowSurrogate(charUnknownLow)) throw new ArgumentOutOfRangeException("CharUnknownLow", “supposed to be between 0xD800 and 0xDBFF"); // If we had a buffer already we're being recursive, throw, it's // probably at the suspect character in our array. if (fallbackCount >= 0) ThrowLastCharRecursive(Char.ConvertToUtf32(charUnknownHigh, charUnknownLow)); // Go ahead and get our fallback strEntity = String.Format("&#{0};", Char.ConvertToUtf32(charUnknownHigh, charUnknownLow)); fallbackCount = strEntity.Length; fallbackIndex = 0; return fallbackCount != 0; } public override char GetNextChar() { // We want it to get < 0 because == 0 means that the current/last // character is a fallback and we need to detect recursion. We // could have a flag but we already have this counter. fallbackCount--; // Do we have anything left? 0 is now last fallback char, negative // is nothing left if (fallbackCount < 0) return (char)0; // Need to get it out of the buffer. return strEntity[fallbackIndex++]; } public override bool MovePrevious() { fallbackCount++; fallbackIndex--; return true; } public override int Remaining { get { return (fallbackCount < 0) ? 0 : fallbackCount; } } // private helper methods private void ThrowLastCharRecursive(int charRecursive) { // Throw it, using our complete character throw new ArgumentException( String.Format("Last character \\u{0:4X} was a recursive fallback", charRecursive), "chars"); } } April 25, 2005
Collation Improvements • OrdinalIgnoreCase • Same results as ToUpper/Ordinal • Matches OS file system results • Correct Serbian collation • Fixed in Windows XPSP2 • Customer reported (MSDN Feedback Center) • Better handling of ignored/ignorable characters • IndexOf/LastIndexOf/IsPrefix/IsSuffix • StartsWith/EndsWith, too April 25, 2005
OrdinalIgnoreCase sample string strTest1 = "IamAString"; string strTest2 = "STRING"; if(strTest1.EndsWith(strTest2, StringComparison.OrdinalIgnoreCase)) { Console.WriteLine(“Successful test!”); }; April 25, 2005
Unicode normalization • Described in UAX#15 at http://www.unicode.org/reports/tr15/ • String.IsNormalized()String.IsNormalized(NormalizationForm normalizationForm) • String.Normalize()String.Normalize(NormalizationForm normalizationForm) • NormalizationForm enumeration • FormC, FormD, FormKC, FormKD • õĥµ¨(U+00f5 U+0068 U+0302 U+00b5 U+00a8)LATIN SMALL LETTER O WITH TILDE; LATIN SMALL LETTER H; COMBINING CIRCUMFLEX ACCENT; MICRO SIGN; DIAERESIS • FormC: õĥµ¨(U+00f5 U+0125 U+00b5 U+00a8) • FormD: õĥµ¨(U+006f U+0303 U+0068 U+0302 U+00b5 U+00a8) • FormKC: õĥμ ̈ (U+00f5 U+0125 U+03bc U+0020 U+0308) • FormKD: õĥμ ̈ (U+006f U+0303 U+0068 U+0302 U+03bc U+0020 U+0308) • In collation, õĥµ¨ ≅ õĥµ¨≅õĥμ ̈ ≅ õĥμ ̈ April 25, 2005
namespace àáâãäå { using System; using System.Text; using System.Globalization; class àáâãäå { [STAThread] static void Main(string[] args) { àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); } static void àáâãäå(string àáâãäå) { StringBuilder àáâãäå = new StringBuilder(); StringInfo àáâãäå = new StringInfo(àáâãäå); àáâãäå.Append(àáâãäå.Normalize(NormalizationForm.FormC)); àáâãäå.Append(": "); for(int àáâãäå=0; àáâãäå < àáâãäå.LengthInTextElements; àáâãäå++) { string àáâãäå = àáâãäå.SubstringByTextElements(àáâãäå, 1); if(àáâãäå.IsNormalized(NormalizationForm.FormC)) { àáâãäå.Append("C"); } else if(àáâãäå.IsNormalized(NormalizationForm.FormD)) { àáâãäå.Append("D"); } else { àáâãäå.Append("_"); } } Console.WriteLine(àáâãäå.ToString()); return; } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå.àáâãäå("àáâãäå"); } } } April 25, 2005
IDN Mapping APIs • IdnMapping class • Based on three RFCs (standard based on Unicode 3.2) • 3490 - Internationalizing Domain Names in Applications (IDNA) • 3491 - Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN) • 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA) • \u5B89\u5BA4\u5948\u7F8E\u6075-with-SUPER-MONKEYS becomesxn---with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n • Properties • AllowUnassigned (allows new Unicode characters) • UseStd3AsciiRules (more like DNS rules) • Methods • GetAscii - Gets ASCII (Punycode) version of the string • GetUnicode - Gets Unicode version of the string, normalized and limited to IDNA characters. April 25, 2005
Unicode property information • New CharUnicodeInfo class • Extends methods on Char • Offical data from the Unicode Character Database at http://www.unicode.org/ucd/ • IsWhiteSpace • GetNumericValue • GetDigitValue • GetDecimalDigitValue • GetUnicodeCategory • GetBidiCategory April 25, 2005
New text element support in the StringInfo class • StringInfo ctor that takes a string • StringInfo.String • StringInfo.LengthInTextElements • StringInfo.SubstringByTextElements() • Both use ParseCombiningCharacters() to get their results April 25, 2005
New StringInfo props/methods sample StringInfo si = New StringInfo("A\u0300\u0301\u0300e\u0300\u0301\u0300“); Console.WriteLine(si.LengthInTextElements); // Length is two for(int ich = 0; ich < si.LengthInTextElements; ich++) { Console.WriteLine(si.SubstringByTextElements(ich, 1); } April 25, 2005
New supplementary character support in lots of methods • New signature -- (String s, int index) • IsControl, IsDigit, IsLetter, IsLetterOrDigit, IsLower, IsNumber, IsPunctuation, IsSeparator, IsSurrogate, IsSymbol, IsUpper, IsWhiteSpace, GetUnicodeCategory, GetNumericValue, IsHighSurrogate, IsLowSurrogate, IsSurrogatePair • ConvertToUtf32, ConvertFromUtf32 methods April 25, 2005
References • MSDN Magazine Article • Make the .NET World a Friendlier Place with the Many Faces of the CultureInfo ClassMarch 2005 - http://msdn.microsoft.com/msdnmag/issues/05/03/CultureInfo/ • SQL Server Books Online “International Considerations for SQL Server”http://whidbey.msdn.microsoft.com/library/en-us/icsql9/html/50dc4fa8-4772-46a8-a8ef-bc134502b4e0.asp • My Blog • http://blogs.msdn.com/michkap • Some other blogs for int’l support in Whidbey • http://blogs.msdn.com/AchimR • http://www.dasblonde.net/ • http://blogs.msdn.com/BCLTeam • Other useful sites • http://www.microsoft.com/globaldev/ • http://lab.msdn.microsoft.com/productfeedback/ • http://www.unicode.org/ April 25, 2005
Globalization Features in Whidbey’s CLRQuestions April 25, 2005