280 likes | 493 Views
Knotty problems in date/time parsing and formatting and time zones. Yoshito Umaoka IBM Globalization Center of Competency. 32nd Internationalization and Unicode Conference. Agenda. Challenges for Implementing Date and Time UI Understanding Time Zone Formatting Parsing.
E N D
Knotty problems in date/time parsing and formatting and time zones Yoshito Umaoka IBM Globalization Center of Competency 32nd Internationalization and Unicode Conference
Agenda • Challenges for Implementing Date and Time UI • Understanding Time Zone Formatting Parsing
Challenges for Implementing Date and Time UI • Two examples • Google Calendar • IBM Lotus Notes • Walking through various requirements for displaying date and time • Solutions provided by CLDR • Design/Implementation Tips
Date Format Types Basic: July 27, 2008 Relative: Today Basic: July 28, 2008 Relative: Tomorrow Basic: August 3, 2008 Relative: August 3, 2008 Interval: July 27 - 28, 2008 Duration: 1 day Interval: July 27 – August 3, 2008 Duration: 7 days
Mini Calendar • Month • Different form without date in some locales • Eg. Polish - lipiec (nominative) vs. lipca (genitive) • lipiec 2008 • 28 lipca 2008 • Day of week • Very short abbreviation • Not always the first letter of day of week name • Eg. Chinese: 星期日 ⇒ 日 • The first day of week • Sunday is the first day of week in many regions, but it’s not true in some regions.
Month/Day of Week Names in CLDR • 3 different widths - wide / abbreviated / narrow • 2 context types – format / stand-alone Month name example - January Day of week name example - Sunday
Date and Time Interval • When displaying a date interval, duplicated date fields could be stripped off. • 3 possible patterns depending on combination of start date and end date • July 20–26, 2008 • July 20 – August 1, 2008 • July 20, 2008 – July 19, 2009 • Different combination patterns in different locales • 20–26 July 2008 • 20 July – 1 August 2008 • 20 July 2008 – 19 July 2009
Date/Time Interval in CLDR • Each <intervalFormatItem> is associated with as “skeleton” pattern and contains one or more patterns • A <greatestDifference> element contains a pattern which will be used when the greatest difference of two given dates matches its “id” attribute <intervalFormatItem id="yMMMd"> <greatestDifference id="y">MMM d, yyyy – MMM d, yyyy</greatestDifference> <greatestDifference id="M">MMM d – MMM d, yyyy</greatestDifference> <greatestDifference id="d">MMM d–d, yyyy</greatestDifference> </intervalFormatItem>
Other Challenges • Various combinations of date fields and widths • “Sat 7/26” • The UI requires to display short format including month, day of month and day of week, but not year • The pattern could be changed depending on the locale • “Sat 26/7” for en_GB • “7/26(土)” for ja_JP • Week number • Week number is commonly used in European countries • The way of calculating week numbers in a year may vary depending on local conventions
Flexible Date Format Support in CLDR (1) • <availableFormats> contains various <dateFormatItem> • Each <dateFormatItem> has id attribute representing “skeleton” • “skeleton” contains only field information in a canonical order • A CLDR consumer provides a “skeleton” – When the matching “skeleton” is available in the locale, the associated pattern is returned. If not, closest match which contains all requested fields is returned. <availableFormats> <dateFormatItem id="MMMEd" draft="provisional">E d MMM</dateFormatItem> <dateFormatItem id="MMMMd" draft="provisional">d MMMM</dateFormatItem> <dateFormatItem id="MMdd" draft="provisional">dd/MM</dateFormatItem> <dateFormatItem id="Md" draft="provisional">d/M</dateFormatItem> <dateFormatItem id="yyMMM" draft="provisional">MMM yy</dateFormatItem> <dateFormatItem id="yyyyMM" draft="provisional">MM/yyyy</dateFormatItem> <dateFormatItem id="yyyyMMMM" draft="provisional">MMMM yyyy</dateFormatItem> </availableFormats>
Flexible Date Format Support in CLDR (2) • When any <dateFormatItem> element does not satisfy the matching criteria, use the rules defined by <appendItems> to append missing fields to one of the existing format. <appendItems> <appendItem request="Day">{0} ({2}: {1})</appendItem> <appendItem request="Day-Of-Week">{0} {1}</appendItem> <appendItem request="Era">{0} {1}</appendItem> <appendItem request="Hour">{0} ({2}: {1})</appendItem> <appendItem request="Minute">{0} ({2}: {1})</appendItem> <appendItem request="Month">{0} ({2}: {1})</appendItem> <appendItem request="Quarter">{0} ({2}: {1})</appendItem> <appendItem request="Second">{0} ({2}: {1})</appendItem> <appendItem request="Timezone">{0} {1}</appendItem> <appendItem request="Week">{0} ({2}: {1})</appendItem> <appendItem request="Year">{0} {1}</appendItem> </appendItems>
Week Data in CLDR • <weekData> • minDays: minimum days in the first week • firstDay: first day in a week • weekendStart/weekendEnd: start/end day of weekend <weekData> <minDays count="1" territories="001" /> <minDays count="4" territories="AT BE CA CH DE DK FI FR IT LI LT LU MC MT NL NO SE SK" /> <minDays count="4" territories="CD" draft="true" /> <firstDay day="mon" territories="001" /> <firstDay day="fri" territories="MV" /> <firstDay day="sat" territories="AE AF BH DJ DZ EG ER ET IQ IR JO KE KW LB LY MA OM QA SA SD SO TN YE" /> <firstDay day="sun" territories="AS AU AZ BW CA CN FO GE GL GU HK IE IL IS JM JP KG KR LA MH MN MO MP MT NZ PH PK SG TH TT TW UM US UZ VI ZA ZW" /> <firstDay day="sun" territories="ET MW NG TJ" draft="true" /> <firstDay day="sun" territories="GB" draft="true" alt="variant" references="Shorter Oxford Dictionary (5th edition, 2002)"/> <firstDay day="thu" territories="SY" /> <weekendStart day="sat" territories="001"/> <weekendStart day="fri" territories="EG IL SY"/> <weekendStart day="sun" territories="IN"/> <weekendStart day="thu" territories="AE BH DZ IQ JO KW LB LY MA OM QA SA SD TN YE AF IR"/> <weekendEnd day="sun" territories="001"/> <weekendEnd day="fri" territories="AE BH DZ IQ JO KW LB LY MA OM QA SA SD TN YE AF IR"/> <weekendEnd day="sat" territories="EG IL SY"/> </weekData>
Design/Implementation Tips • Keep internal date/time representation locale-independent • Localized format may vary depending on implementation • Use standard format such as ISO8601 for data exchange • Do not hardcode format patterns in your source code • Do not put format patterns in resource bundles with other localizable messages! • Locale support is more than UI translation • Translation vendors are usually not able to handle regional variants • You should be able to find solutions in CLDR/ICU – if no available, file bugs to request new features • Avoid date/time data entry by text • Formatting date/time is complicated, so is parsing • Use UI widget to eliminate ambiguous data entry • Understand regional conventions of calendar system • Rules for calculating some calendar fields may vary • Be prepared to support non-Gregorian calendar systems • For example, • Buddhist calendar is the most preferred calendar system in Thai • Japanese calendar support may be required depending on target sectors
Understanding Time Zone Formatting and Parsing • CLDR’s approach for supporting time zone formatting • Choosing a right time zone format type for your needs • Tips for processing date/time with time zone http://www.time.gov/images/worldzones.gif
Time Zone Implementations • The tz database (a.k.a Olson database) • 568 zones (436 unique zones / 132 aliases) (2008d) • Support historic time transitions since late 19th century • At least 1 zone per country/region • Time zone abbreviations for display (3 or 4 letter ASCII alphabet), such as “EST”, “JST”… • Used by *nix systems (Solaris, Linux, AIX, Mac OS X…) and Java • MS Windows time zone • 84 zones (Windows Vista), some are obsolete • Support historic rules (2005 and beyond) in Vista/2008 Server (Dynamic DST) • A zone is shared by multiple cities/countries • Time zone display names including the standard offset and common name or exemplar cities, such as “(GMT-05:00) Eastern Time (US & Canada)”, “(GMT+09:00) Osaka, Sapporo, Tokyo”…
Time Zone Format Types in CLDR (1) • Generic location format • Designed for populating choice lists for time zones • Uniquely mapped to “canonical” zone IDs • Examples • Europe/Rome ⇔ Italy Time [en] • America/New_York ⇔ United States (New York) Time [en] • America/New_York ⇔ Hora de Estados Unidos (New York) [es] • Generic non-location format • Designed for recurring events, meetings, or anywhere people do not want to be overly specific • Two widths – long/short • Examples • America/New_York ⇒ ET [en/short] • America/New_York ⇒ Eastern Time [en/long] • America/Montreal ⇒ Eastern Time [en/long]
Time Zone Format Types in CLDR (2) • Generic partial location format • A variant of generic non-location format – used as a fallback name when the generic non-location format is not specific enough • Two widths – long/short • Examples • America/Mexico_City ⇒ Hora central (Ciudad de México) [es_US/short/Mar 9 – April 6, 2008] • America/Chicago ⇒ Hora central (Chicago) [es_MX/short/Mar 9 – April 6, 2008] • Specific (non-location) format • Designed to distinguish between standard time and daylight time • Two widths – long/short • Examples • America/New_York ⇒ EST [en/short/standard time] • America/New_York ⇒ EDT [en/short/daylight time] • America/New_York ⇒ Eastern Standard Time [en/long/standard time] • America/Montreal ⇒ Eastern Standard Time [en/long/standard time]
Time Zone Format Types in CLDR (3) • Localized GMT format • Designed for representing the offset from GMT • Local decimal digits are used • Examples • America/New_York ⇒ GMT-05:00 [en/standard time] • America/New_York ⇒ GMT-04:00 [en/daylight time] • America/New_York ⇒ Гриинуич-0500 [bg/standard time] • RFC 822 format • Locale in-sensitive “fixed” format representing the offset from GMT defined by RFC 822 • ASCII decimal digits are always used • Examples • America/New_York ⇒ -0500 [standard time] • America/New_York ⇒ -0400 [daylight time]
CLDR Metazone • A metazone is an grouping of one or more internal zones that share common non-location display names • Following zones are currently associated with a metazone “America_Eastern” (CLDR 1.6.1)America/Nassau, America/Resolute, America/Coral_Harbour, America/Thunder_Bay, America/Nipigon, America/Toronto, America/Montreal, America/Iqaluit, America/Pangnirtung, America/Port-au-Prince, America/Jamaica, America/Cayman, America/Panama, America/Grand_Turk, America/Indiana/Vincennes, America/Indiana/Petersburg, America/Indiana/Marengo, America/Indiana/Winamac, America/Indianapolis, America/Louisville, America/Indiana/Vevay, America/Kentucky/Monticello, America/Detroit, America/New_York • Each metazone has a set of localizable names • Following names are used for metazone “America_Eastern” (CLDR 1.6.1)
Time Zone Short Abbreviation Problem • 2 to 4 letter ASCII alphabets abbreviations are used for short names, such as ET, EST, PDT… • The extent to which time zone abbreviations are understood varies heavily by region • For example, how many people recognize EAT (East Africa Time) in US? • CLDR’s solution - a boolean value associated with a zone/metazone “commonlyUsed” to enable/disable short abbreviations • Metazone “Africa_Eastern” has a short standard name “EAT” for English locales • For metazone “Africa_Eastern” • commonlyUsed = true in en_ZA [English (South Africa)] • commonlyUsed = false in en_US [English (United States)]
Ambiguous Time with Generic format • Daylight ⇒ Standard transition • Sunday, November 2, 2008 01:30:00 Pacific Time? • Valid, happens twice • Generic format cannot distinguish between 1:30 PST and 1:30 PDT • Standard ⇒ Daylight transition • Sunday, March 9, 2008 02:30:00 Pacific Time? • Invalid! • 30 minutes 1 second after 01:59:59? or 30 minutes before 03:00:00?
Tips for Processing Date/Time with Time Zone • For serializing future date/time data in text format, use RFC 822 format with zone ID • Time zone rules could be changed • GMT offset information along with zone ID is sufficient to fix up data • The result of java.util.Date#toString() might be ambiguous • “CST” is used for both “America/Chicago” and “Asia/Shanghai” in Java • CLDR does not use a same name for multiple time/meta zone • Many zones in tz database use LMT (Local Mean Time) as initial offset • LMT is calculated from the longitude and the GMT offset has a fraction of minute • ISO8601 / RFC822 / Java GMT format does not have second field, so it may not roundtrip • Minimize the dependencies on Windows time zone in multi-platform applications • Some windows time zones are not well maintained • No historic time zone rule support before Vista/2008 server • Mapping between Windows time zones and the tz database is 1-to-n
Links • Unicode CLDR project - http://www.unicode.org/cldr/ • UTS#35 UNICODE LOCALE DATA MARKUP LANGUAGE (LDML) - http://www.unicode.org/reports/tr35/ • ICU Project - http://icu-project.org/ • tz database - http://www.twinsun.com/tz/tz-link.htm