240 likes | 356 Views
Internationalisation. GLOBALISATION. I bet it is quite natural to dream about writing software that is being sold around the world… However, there may be some small obstacles on the way to selling your software worldwide. Today we study potential problems and solutions.
E N D
GLOBALISATION • I bet it is quite natural to dream about writing software that is being sold around the world… • However, there may be some small obstacles on the way to selling your software worldwide. • Today we study potential problems and solutions. • Terms:Localisation = adjusting software locallyGlobalisation, internationalisation = creating a software in such a way that it is eays to localise it to different countries.
History • As a little warm-up, consider the situation about 20 years ago. • At that time there was an increasing interest into creating software products, which could be sold to different customers within the home market. • For customising the software, it was important to put all user interface constants into a place, where they are easy to change. • Program code is not such a place -> parameter files or database are a much better choice. • Now the interest grows towards international market.
Simple example • http://java.sun.com/docs/books/tutorial/i18n/index.html deals with internationalisation issues. • In the examples, multilingual texts are managed using- a locale, identified by a (country, language) pair- resource bundles, one per locale,- property files, where strings are identified bykeys • Strings are identified by keywords within a locale.
Locales • Globalisation in Java is based on the use of Locales. • A local is identified by a language (compulsory), country (optional) and variant (optional). • A class, whose behaviour is based on the use of a locale, is called locale-sensitive. • You can find locales available to a locale-sensitive class by using the getAvailableLocales() method. • There is also a default locale for a Java Virtual Machine, and it can be accessed by Locale.getDefault() • Different objects may use different locales.
Messages Labels on GUI components Online help Sounds Colors Graphics Icons Dates Times Numbers Currencies Measurements Phone numbers Honorifics and personal titles Postal addresses Page layouts What data items should be globalised? • Our examples in the previous slides only dealt with some of these! • Labels can be managed in a fairly straightforward manner, if enough space is reserved for them. • Now let’s have a look at the rest…
Identify what needs to be managed through locales • As you think about locales, you will find out that you have - data items such as messages and sounds, which change altogether with the locale, and- data items, which remain the same, but whoseformatting changes, e.g. dates and numbers- possibly data items not to be localised (internal use, interface to another application, …). • Design the globalisation - identify which is which. • Arrange your data items into resource bundles (e.g. items for the same form in the same bundle, so that you will not need to load unnecessary objects).
Formats - numbers • Numbers are formatted differently in different countries, e.g.:345 987,246 – France345.987,246 – Germany 345,987.246 - US • Java includes a NumberFormat class that can be used to format numbers, currencies (no exchange rates, though :) and percentages • You can use the NumberFormat class to both create formatted strings and parse strings. • You can also provide your own patterns, if this is not enough for you…
Formatters • E.g. a DateFormat extends Format, and you get an instance of DateFormat by using the getInstance method, using the locale as a parameter. (In a way, the DateFormat class makes specialized DateFormatters.) • See example code.
21.4. at 12:12:49 in a fairly long and complete format • Finnish Finnish: 21. huhtikuuta 2004 12:12:49 EEST • French French: mercredi 21 avril 2004 12 h 12 EEST • German German:Mittwoch, 21. April 2004 12.12 Uhr EEST • US English: Wednesday, April 21, 2004 12:12:49 PM EEST • Dutch Dutchwoensdag 21 april 2004 12:12:49 uur EEST • This is called “FULL” format, notice differences
Different formats • For each language, there are five predefined formats in Java internationalisation, they are called: (Dutch example) • DEFAULT 21-apr-2004 12:12:49 • SHORT 21-4-04 12:12 • MEDIUM 21-apr-2004 12:12:49 • LONG 21 april 2004 12:12:49 EEST • FULL woensdag 21 april 2004 12:12:49 uur EEST • You may also define your own format, but I guess it is generally best not to do that. • To format, use the DateFormat class.
Messages containing variable parts • Examples:- 405,390 people have visited your website since January 1, 1998. (1) - The <devicename> number <devicenumber> has been activated. (2) • Word order may change between languages, which may make it impossible to correctly translate message (1) assuming that it is the text between the number and the date. • In message (2) the word “activated” may require different translation in some languages (e.g. French) depending on the gender of the word for the device name. • Basic rule of thumb: If you can avoid messages containing these variable parts, then do so!
MessageFormat and ChoiceFormat • With the MessageFormat class you can define a message template, which gives the message text and shows where to format the changing data and how. • With ChoiceFormat, you can choose between strings using based on a number you give as a parameter (this is particularly handy for managing plurals). • The code example is probably more instructive than any really short explanation.
Characters • US Ascii – 7 bit • ISO 8859-X where X is some digit – an 8-bit system – if 8th bit is 0, then the first 7 bits represent a US Ascii character. • Windows 125x codepages – similar to ISO 8859-X, but not the same of course – typical Windows interoperability nightmare… • Unicode – meant to represent all characters from all languages. Needs more bits (usually done with 16) but there are several encoding schemes. Some, for instance, use two bytes (16 bits) for some characters and one byte (8 bits) for some… • http://www.unicode.org/index.html
Chinese and Japanese • Thousands of symbols. • Unicode can do – but you need more pixels on the screen as well. • In Japanese there are several writing systems. • Text input can be done as followed:1. The user types in the word in some phonetic writing system based on latin characters.2. The system shows the characters (there may be many) matching the phonetic writing.3. The user picks the right character. • See http://www.china.com
Korean • In the Korean writing system (hangul), characters are composed from parts based on which character follows which. • There is a limited number of building blocks ie. character parts (can’t remember, but maybe around 25).
Writing order • Latin – left to right. • In Chinese and Japanese, traditional writing order is top-down, and columns right-to-left. • Nowadays adjusted to ordinary left-to-right. • In Arabic and Hebrew, the text itself is written from right-to-left, but all latin names (like yours, probably) are written left-to-right in the middle of right-to-left.
Character properties • Don’t do: if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')) // ch is a letter • In Java, char represents a Unicode character. • You can use class Character to check for things such as white space, digits, upper and lower case. • E.g.: Character.isDigit(ch), Character.isLetter(ch), Character.isLowerCase(ch) • You can also use .getType() and predefined constants to check things like: if (Character.getType('a') == Character.LOWERCASE_LETTER)
Comparing characters and strings • You can use the Collator class, e.g.: Collator myCollator = Collator.getInstance(); if( myCollator.compare("abc", "ABC") < 0 ) System.out.println("abc is less than ABC"); else System.out.println("abc is greater than or equal to ABC"); • getInstance() takes also a locale as a parameter. • You can customise the rules used in the comparisons.
Finding boundaries of words, sentences, etc. • The boundaries may, of course, be defined differently in different languages. • Initialise BreakIterator with one of these methods: - getCharacterInstance - getWordInstance - getSentenceInstance - getLineInstance • E.g. BreakIterator sentenceIterator = BreakIterator.getSentenceInstance(currentLocale); • One BreakIterator only works with one type of breaks.
Colors, gestures, other symbols • E.g. in far east there is a lot of symbolism in colors, names, numbers, etc. (e.g. red is a good color, 4 is a bad number, etc.) • Also, for instance hand gestures vary from one place to another – what is good here may be bad elsewhere. • Even in Europe there is variance. Consider tick marks: x (good in Finland, bad in UK), √ (not exactly like this, however good in UK, bad in Finland).
Higher cultural issues • General customs • How to do business • How to be polite • How to say no • How to avoid ”loosing face” in far east. • What to avoid in particular. • These issues may have impact on software as well. • E.g. there are examples of software with built-in knowledge of business processes which do not really work in some cultures.
Example problem • We need some software, of which settling a price for a product is one functionality, which can be done reasonably separately with customer. • In different countries, different policies may need to be followed. • In some countries you haggle for the price for a long time, both parties starting from something completely unreasonable. • In some countries a fixed-price policy may be the only possibility in practice. • In some countries a small reduction may be possible from the initial price tag.
Conclusions • The final conclusion is:”This is all quite complicated, and if you have to get deeper into these things, find someone who really knows.” • When you start writing your software, think a bit on the need of globalisation. • If you know that English (or Finnish) is sufficient, then it makes life easier. • If you know that globalisation is needed, you should start globalising when you start writing your software! Afterwords is hard. • Java offers lots of resources. If you want to re-invent the wheel, this may not be the best place.