770 likes | 809 Views
Internationalization. Contents. Internationalization Language rendering Rendering objects Cultural issues. Globalization. The process of worldwide economic, political, technological, and social integration
E N D
Contents • Internationalization • Language rendering • Rendering objects • Cultural issues
Globalization • The process of worldwide economic, political, technological, and social integration • The process of making the necessary technical, managerial, personnel, marketing and other enterprise decisions to support localization
Internationalization • The adaptation of products for use throughout the globe • Supporting multiple languages • Supporting multiple character sets • Supporting different formats for numbers, dates, currency, etc. • Printing on different paper sizes
Localization • The addition of special features to allow a product to be used in a specific locale • Local language support • Local currency support • Local cultural concerns • Local symbols • Order of sorting
The Need for I18N • 8-10% of the worlds population uses English as its primary language • Even in the USA large fractions of the population use other languages • Miami – 78% • Los Angeles – 45% • San Francisco – 42% • New York City – 25% of subway riders speak no English
The Need for I18N • US based Palm Computing has a 68% share of the Latin American market • Personal computer suppliers • USA – 38.8% • Europe – 25% • Asia – 12% • Huge amounts of sales are now for the international markets
Contents • Internationalization • Language rendering • Rendering objects • Cultural issues
Language Rendering • The rendering of language is one of the most visible aspects of software • One common misconception is that each country has one language • Often one country will have several languages and one language will be spoken in several countries
Language Rendering • For example • Canada – French & English • Belgium – Dutch & French • Switzerland – Italian, French, & German • In addition, there are regional differences in the languages • Eg. U.S. English differs in spelling and some words from British English
Language Rendering • As a result, language is usually specified by both the country and the language • EN.US – American English • FR.CA – Canadian French • Sometimes flags are used to indicate which language to select • This is a poor choice due to the lack of a one-to-one relationship between countries and languages
Rendering Japanese • Japanese illustrates most of the problems rendering languages • Japanese text is a mixture of 3 scripts which can be combined in one sentence • This results in a huge number of characters to render • This is still simpler that countries with multiple dialects and scripts
Japanese Scripts • Kanji • The complete written language with 50,000 characters • Kana • Symbols representing sounds, broken into two groups • Hiragana – native Japanese sounds • Katakana – represents foreign words other than Chinese and Korean • Romanji – Letters from the Roman alphabet to use for untranslated foreign words
Character Sets • Different character sets are used to render the characters in all the scripts • A character set is the encoding of a series of letters and other symbols into numeric codes • Since there are so many scripts, there are a number of character sets as well
ASCII Character Set • American Standard Code for Information Interchange • Represents 128 characters in 7 bits with a parity bit • Extended ASCII uses all 8 bits for 256 chars • Has control characters, Roman alphabet, punctuation and some special characters • Supports English, Swahili, and Hawaiian
ISO 8859 Character Sets • This is a family of 8 bit character sets used to represent European languages • This is the default character set for use on the web • It supports several language specific groupings • ISO 8859-1 – western Europe (French, Italian, German, Spanish…)
ISO 8859 Character Sets • ISO 8859-2 – Central/Eastern Europe (Hungarian, Polish,…) • ISO 8859-3 – Southern Europe (Esperanto, Maltese) • ISO 8859-4 – Northern Europe (Estonian, Latvian,…) • ISO 8859-5 – Cyrillic • ISO 8859-6 – Arabic • ISO 8859-7 – Greek • ISO 8859-8 – Hebrew • ISO 8859-9 – Turkish • ISO 8859-10 – Nordic • ISO 8859-11 – Thai • ISO 8859-12 – unused • ISO 8859-13 – Baltic Rim • ISO 8859-14 – Celtic
EBCDIC • Extended Binary Coded Decimal Interchange Code • An 8 bit encoding • Used mainly on IBM machines and some Fujitsu, Unisys, and HP • EBCDIC is incompatible with ASCII and must be translated • Some characters are not translated exactly!!
Code Page • A code page is simply the mapping of character to numeric values • This is the same as a character set • The EBCDIC code page is shown here
Windows Character Sets • Windows has their own character sets which are slightly different from the ISO character sets • Windows 1250 – Central European • Windows 1251 – Cyrillic • Windows 1252 – Western Languages • Windows 1253 – Greek • Windows 1254 – Turkish • Windows 1255 – Hebrew • Windows 1256 – Arabic • Windows 1257 – Baltic • Windows 1258 – Vietnamese
Unicode • This is a modern character set which incorporates all other existing character sets • So far, it represents 100,000 characters • It has various encodings which can be used depending on the amount of storage available • Standard encoding is 16 bit, representing 65,536 characters
Unicode • UTF-8 • Represents all universal characters in 1 – 4 bytes • Backwards compatible with ASCII which is represented as 1 byte • Prefixes code on the first byte indicate which encoding is being used • More compact than using 16 bit or 32 bit encodings
Unicode • The use of Unicode simplifies many problems with internationalization • Unicode is supported by many modern software platforms • Java • XML • Modern Operating Systems
Fonts • A character set connects an abstract description of a character with a code • A font connects a glyph with a code • There can be several fonts for each character set allowing for different shapes and styles of symbols • Different languages have fonts with different shapes and requirements
Font Recommendations • Provide enough space between lines for different ascenders and descenders • Do not use ornate fonts that will obscure accents used by some languages • Choose fonts that support required accents • Do not assume that any particular font will be available on the target platform • Many non-Latin languages require proportional spacing (ie. Arabic)
Text Direction • There are 3 types of directionality • Left-to-right • European languages, Thai • Left-to-right/vertical • Chinese, Japanese, Korean • Bidirectional • Hebrew & Arabic
Bidirectional Text • Although the text is right-to-left, embedded numbers and other languages are right-to-left • If the number “123-4567” is embedded • As a phone number it is read left-to-right • As a subtraction it means 4567 minus 123 • Bidirectional text is right justified
Paper Size • Most of the world uses metric paper sizes based on the ISO 216 standard
Hardware Concerns • Hardware around the world is different • Make sure the keyboard can enter the needed character sets • Make sure the printer can handle the paper sizes required • Make sure the printer and display can handle the needed character sets
Translation Issues • Accurate translation is critical to internationalization • Hire good translators • Provide them with the text to translate • Provide a glossary defining each of the words to translate so that they understand the intended meaning
Contents • Internationalization • Language rendering • Rendering objects • Cultural issues
Sorting • Different countries have different rules for sorting • Be able to handle accents • Watch for letter sequences • In Spanish, “cho” comes after “co” • In many languages upper case is sorted after lower case • ISO/IEC 14651:2000 provides guidelines on international collation
Date and Time Formats • ISO 8601 specifies a format for international representation of time and date • Most locales prefer to use their own format • Your software should be able to format in the local manner • This information is usually available as a locale in your programming language (Java, C, .NET)
Addresses • Addresses have different formats throughout the world • In Mexico, • Name(paternal, maternal, first) • Street & number • Building, floor, suite • Colony • City, state • Postal code
Telephone Numbers • Telephone Numbers also differ throughout the world • Different numbers of digits • Different separators • Different groupings
Currency & Numbers • Currency & numbers are formatted differently in each country • Different currency symbols • Different location of +, - signs • Different thousands separators • Different definition of billion • Most of this information will come from the locale
Contents • Internationalization • Language rendering • Rendering objects • Cultural issues
Cultural User Interface Design • Culture has an influence on what people expect from an interface and how they interpret the interface • Geert Hofstede studied IBM employees in 53 countries and developed 5 dimensions on which cultures could be measured • We will now look at these 5 dimensions
Power Distance • Measures how well a society accepts large or small differences in power in a social hierarchy • For example, whether an employee of a large organization has easy, informal access to the boss • Cultures with easy access to powerful figures are assigned a low power distance index
Individualism vs. Collectivism • This measures whether a culture favours individual achievement or whether it favours group efforts • A high collectivist rating is assigned to cultures which favour group efforts over individual ones
Masculinity vs. Femininity • Measures the degree to which a culture separates traditional gender roles • Traditional male role • Tough, task-oriented warriors • Traditional female role • Tender, gentle home makers • More male-oriented Cultures score higher
Uncertainty Avoidance • Measures the degree to which a culture is uncomfortable with and tries to reduce uncertainty • Cultures which emphasize punctuality, formality, and explicit communication rate high in uncertainty avoidance
Long Term Orientation • This is prevalent in cultures with Confucian thought, which emphasizes patience • Usually found in Asian cultures • Higher values indicate more long term orientation
Designing for Culture • Studies were done to see what type of UI components were used in different cultures • The results can be used as a guideline for how to design user interfaces that meet cultural expectations
Power Distance • Metaphors • High • Images of government or corporate institutions and buildings, schools, monuments, etc. • Low • Informal or popular institutions or buildings, Montessori schools, public parks, etc.