1 / 23

Software Localization(L10N) and Internationalization(I18N)

Software Localization(L10N) and Internationalization(I18N). Localization : customizing a software for a particular language/market Class discussion : What are the things that needs to be customized when Microsoft Word need to be changed from English to Chinese?. Example: Good Morning.

gedelstein
Download Presentation

Software Localization(L10N) and Internationalization(I18N)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Localization(L10N) and Internationalization(I18N) • Localization: customizing a software for a particular language/market • Class discussion: What are the things that needs to be customized when Microsoft Word need to be changed from English to Chinese?

  2. Example: Good Morning Public class GoodMorning { Public static void main(String s[ ]) { System.out.println(“Good morning!”); } } • What if you want to do this for Hong Kong? • What if you want to do this for China and other places? • Think of a way to write it without the need to change the source code

  3. Revised: Good Morning Import java.util.*; Public class GoodMorning { Public static void main(String s[ ]) { ResourceBundle resources; try { resources = ResourceBundle.getBundle(“MyData”); System.out.println(resources.getString(“Hi”); } catch (MissingResourceException mre) { System.err.println(“MyData.properties not found”);} }

  4. Internationalization(I18N) • I18N: A software methodology to avoid writing separate application software for different language/cultural environments. • change of language environment without change of programming logic(no need to modify source code) • Why I18N: • More complicated software design and implementation • But Saving development cost for global market • Minimize localization • Minimize exposure of source code

  5. Principles of I18N: • Do not hard-code any language related data/elements(language data) in a program • Design well defined Interface to access language data from external sources(files, databases, or even programs) • Clear instruction for localization

  6. How to write an I18N program • analysis of language related elements in the application program and make sure they are not hard-coded in the program • Design/use language interface Specification (routines to access the language data in a well defined way) • Preparation of localization instructions(/follow standard) (mustbe precise so that data can be prepared following the instructions)

  7. Example • Bank ATM machines in Hong Kong • Traditional program: • display alternate screens • Insert card and input password • get preferred display • If English, execute English program • else execute Chinese program • What do the English and the Chinese programs have in common? • What if we need to add Simplified Chinese?

  8. I18N conscious program: • display alternate screens • Insert card and input password • get preferred display • open preferred display file • execute ATM program

  9. Discussion on this example: public class GoodMorning { public static void main(String s[ ]) { int country = 0; if (s.equals(“English")) { country = 1; } else if (s.equals(“Chinese_HK")) { country = 2; } switch (country) { case 1: System.out.println(“Good Morning!”); break; case 2: System.out.println(“早上好!");break; default: System.out.println("Good Morning"); } } }

  10. Data for User Interface vs. Data for manipulation • Data for User Interface: resource files • Data for manipulation may not be in the same language/script as the data displayed in the user interface. • Use an English UI of Window Word to compose a Chinese article or vice versa • Not necessarily in resource files

  11. Language/culture Related Issues: Display & processing(basic to all applications) • Internal representation: codeset • Different classes of the subgroups in a codeset • Input: encoding of input strings to internal code • Output: internal code to glyph association(display) • Date expression • Currency symbols • Fraction& large numbers: • etc.

  12. I18N Issues on Language Related Applications • Handling of messages in applications(not system msgs): • Writing the menu items and messages in resource files • providing a language parameter used in application or take the locale value to open the appropriate file(either in different directories), or use different file names. • Certain language specific Applications(e.g. spell checking): • Open it as an API so that different algorithms can be (dynamically) linked to the application • Data Format: • Example: Address - vary according to locations USA: Flat No.(incl. bldg), street, City, ZipCode(incl. State) HK: Flat, Floor, Bldg, Estate, Street(may be optional), District • Database table design is not straight forward.

  13. Measurement scales: • Imperial system vs. metric system: can cause rounding problem • Paper sizes • Chinese language specific: • Segmentation • Lack of morphological rules to indicate tense(time), active/passive voice, etc. • No need for morphological rules in searching • More complicated sorting algorithm due to multiple features of Chinese characters

  14. Internationalization Facilities POSIX • POSIX: Portable Operating System Interface • NLS: National Language Support • Locale: A particular localization setting C locale, zh_TW, etc /home/staff/csluqin:>date Thur Feb 24 15:38:25 CST 2005 :> setenv LANG zh_TW :> echo $LANG zh_TW :> /usr/openwin/lib/locale:>env LANG=zh_TW.BIG5 date 中華民國 94年 02月26日 15時38分 27秒 CST :> /usr/openwin/lib/locale:>env LANG=fr date mercredi, 3 avril 2002, 14:30:51 HKT(not available now)

  15. Posix Locale categories LC_CTYPE: Controls the behavior of character handling functions, such as isalpha() LC_TIME: Date and time format and functions LC_MONETARY: Currency symbol, and functions etc LC_NUMERIC: Decimal separator and thousands separator LC_COLLATE: Control sorting order and string conversion/comparison LC_MESSAGES: Controls the choice of message catalogs(User message translation) :> env LANG=zh_TW LC_MESSAGE=c

  16. Character class related test functions: isalpha( c ), isupper( c ), islower( c ), isdigit( c ), isxdigit( c ), isalnum( c ), isspace( c ), ispunct( c ), isprint( c ), iscntrl( c ), isascii( c ), isgraph( c ) • Character conversion functions: toupper( c ), tolower ( c ) • Wide character vs multi-byte characters • Wide character handling functions: mblen( c ), mbtowc( ), wctomb( ), mbstowcs( ), wcstombs( ) National Profile: data prepared for POSIX functions in a particular locale. Example of NP.GB

  17. NLS and Symbolic Names • A National profile is written using symbolic names • Each locale has a separate file called charmap which maps the symbolic names of each character to the actual code of that locale Symbolic Name Encoding <A> \x41 <two> \x32 <semicolon> \x3b <GB16-01> \xb0\xa1 /*啊 • Why Symbolic names: • Less error prone • Flexibility • Language/cultural conventions different but the codeset is the same • Same language/cultural convention but different codesets

  18. Making Portable software for different encodings (codeset independent) Char s[100]; char *p; fgets(s,sizeof(s), stdin); /* get a line of input*/ p = strchr(s,’A’); /* find letter A */ if (p != NULL) /* if found, */ *p = ‘\0’; /* replace with null byte*/ • What is the problem with this program? • ‘A’ in EUC encoding is fine: 0X41(Ascii code), but if this program is ported to a PC big5 system => second byte of an ideographic character • 乙丕再你杗呸服隹括耍唧涉…all the xx41 in Big5! • C language standard Guarantee: • 0X00 is not part of any MB character marking end of a string • Use of wide character

  19. Wide characters vs. Multi-byte characters • They may be referring to the nature of codesets or it may refer to data types in programming languages • Multibyte characters: Character lengths vary from character to character, it can be referring to characters in a single codeset(Taiwan’s CNS), or characters in multiple codesets(Big5 with ASCII) such as char in C language • Wide characters: fixed-length character encoding such as wchar_t in C language, and characters in Java which are all unicode(wide characters)

  20. Multibyte examples(Big5): 學習ABC => total of 7 bytes 學習普通話 => total of 10 bytes • Problems: • String length and byte length cannot be calculated directly(context sensitive). Detection of character boundary is needed. • Difficult to go to any position in a string to know if it is the first byte of a character or not • Need for conversion of MBC and WC

  21. Conversion of MBC to WC • Note: Unicode is a WC, but WC is not necessarily Unicode

  22. When to use MBC • Copy data only • Comparing for equality • Searching for control characters • Single byte data only: if MB_CUR_MAX = 1 • When to use WC • Collation: sorting • Parsing characters: searching and processing • String editing

  23. MB_LEN_MAX, LC independent • MB_CUR_MAX, <stdlib.h> LC dependent • Use Wide characters Char s[100]; wchar_t ws[100]; size_t n; char *p; wchar_t *wcp; fgets(s,sizeof(s), stdin); /* get a line of input*/ mbstowcs(ws,s,100); /* convert s to ws */ wcp = wcschr(ws, mbtowc(’A’) ); /* find “A” */ ……..

More Related