230 likes | 358 Views
CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping. David Tuffley School of Computing & IT Griffith University. Internationalisation - Basic Rules. Never hard-code translatable text Do not reuse the same string in different context 1 byte < > 1 character < > 1 glyph
E N D
CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University
Internationalisation - Basic Rules • Never hard-code translatable text • Do not reuse the same string in different context • 1 byte < > 1 character < > 1 glyph • Watch for strings with several parameters CIT3611 Week 5: Code sets
Internationalisation - Goals Making sure: • Your application is able to process text from any locale • The interface can be localised without changes in the source code • The documents or data created by your application are easy to localise CIT3611 Week 5: Code sets
Internationalisation - Code Sets • Character set is like a "bag" of characters. Example: A, B, d, ñ • Code set, coded character set or code-page, is the same as the character set, but a specific value, the code (or code-point) affects each character. Example: A=65, B=66, d=100, ñ=241 CIT3611 Week 5: Code sets
Code Sets - Get Your Facts Straight • The vocabulary pertaining to code sets is often used incorrectly. • The terms code set and code page are interchangeable. • Microsoft documentation is confusing regarding code sets. • Nadine Kano's book helps CIT3611 Week 5: Code sets
ANSI Windows not the real ANSI • The first version of Windows used ISO-8859-1 (Latin-1) for code set. Then Microsoft introduced 24 extra characters (codes from 0x80 to 0x9F) that are not part of Latin-1. • Noticeable in some of the fonts still shipped with Windows: MS Sans Serif has no glyph defined for these code-points. The code set for Windows US should be called Windows Latin-1, or code-page 1252. CIT3611 Week 5: Code sets
"ANSI" not "Windows code set" • Some documents name the Windows code set "ANSI" even if when you use it in a different localised version of Windows, it is actually the Windows Cyrillic, or Windows Greek or Windows Turkish code set. • Same way the document uses "OEM" to refer to the DOS code-page, it should use a generic term for the Windows code set, rather than "ANSI." CIT3611 Week 5: Code sets
Don’t use ‘character sets’ or ‘charsets’ when you mean code sets • Code set is an implementation of the character set • Several code sets can implement the same character sets. In this case, the list of the characters supported is the same, but the codes are different. Eg. UCS-2 and UTF-8 are two different code sets, but they both implement the Unicode character set. CIT3611 Week 5: Code sets
Don’t mix up file format and file code set • People mix up the content and the container: the format of the file and its code set. They will say: "I saved this file in ASCII" when they really mean "I saved this file in Plain text." A plain text file could be in ASCII, but can also contain extended characters. CIT3611 Week 5: Code sets
Code Set - Families • DOS • ISO • Macintosh • Windows • IBM mainframe CIT3611 Week 5: Code sets
Code Sets - Unicode • Unicode an international character set • Has the principal scripts of the world • Unicode standard is foundation for the internationalisation and localisation of software • There are three levels of support for Unicode:1: Combining characters not allowed 2: Avoid duplicate coded representations 3: All combining characters are allowed CIT3611 Week 5: Code sets
Han unification • To fit the tens of thousands of Chinese, Japanese and Korean ideograms in a 64-KByte space, Unicode uses the Han unification: where Japanese and Korean characters are derived from the Chinese characters. • In many cases the same symbol will mean the same thing. CIT3611 Week 5: Code sets
Character Composition • To support complex characters with diacritics, Unicode defines a generic way to encode a complex character. Instead of being coded in whole form, you can code any character with diacritics by using non-spacing marks. • Character composition is used, for example, to encode the Vietnamese characters. CIT3611 Week 5: Code sets
Surrogates • Hopefully you will not have to deal with surrogates. They are the mechanism put in place in Unicode to access the additional planes of ISO-10646. You can see them as "double-bytes," except they are double-wide-chars. CIT3611 Week 5: Code sets
Code Sets - Conversion • Converting from one code set to another is easy when you are only dealing with single-byte code sets. CIT3611 Week 5: Code sets
Screen-based help • plain text "Read Me" files, • tutorial files, • custom integrated help, • sample files and • stand-alone hypertext help. CIT3611 Week 5: Code sets
General Guidelines • Text Expansion • Jargon, Humor, Use of Gender- or Culture-Related Roles, Characteristics, or Issues • Consistency with Software, Hardware, and Documentation • Hypertext Links • Text Styles and Formatting CIT3611 Week 5: Code sets
General Guidelines cont. • On-Screen Controls • File Format CIT3611 Week 5: Code sets
Windows Online Help • "Title" Footnote Text • "Keyword List" Footnote Text • Definitions (Pop-up Topics) CIT3611 Week 5: Code sets
Prototyping the key to success • Effective prototyping may be the most valuable core competence an innovative organisation can hope to have (Michael Schreg) • ‘Spec Driven’ put much effort into developing a specification before proceding with production • ‘Prototype Driven’ begin with an early prototype, then proceed with many iterations CIT3611 Week 5: Code sets
Prototyping the essential medium of: • Information transmission • Interaction • Integration • Collaboration CIT3611 Week 5: Code sets
Work as play, play as work • You can ‘play your way’ to successful, innovative product development • At odds with traditional management models that champion predictability and control CIT3611 Week 5: Code sets
Supported by research • Research by Tabrizi & Eisenhart (Stanford) looked at 72 product dev projects in 36 countries in Asia, Nth America and Europe • Most effective were those that iterated constantly • Least were the hyper-organised, plan, plan planners • Strong prototyping cultures therefore produce strong products CIT3611 Week 5: Code sets