170 likes | 278 Views
Creating Interfaces: Localization. Language & other issues character codes Homework: preparation for future topics. Finish presentations. Everyone post constructive comments on at least 2 other projects. (Note: catch up on other postings.). Many, interconnected issues.
E N D
Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics
Finish presentations • Everyone post constructive comments on at least 2 other projects. • (Note: catch up on other postings.)
Many, interconnected issues • Create web site for use in several specific 'local' places. • Create multiple web sites, each for use in specific place. • in an efficient, effective manner so any underlying common content does not need to be duplicated (and commonality diluted). • Develop tools (networking s/w, standards, etc.) that promote Web as "global, interoperable tool of communication" • www.w3c.org
Localization • not just language • language is not just character code • UCS (universal character set) and UNICODE, many, many related standards to address encoding issues. • dates • local date and also way to express 'western' date • time • money • position on and flow across page • acceptable images, photography, icons • ?
Character code • Note: European languages plus several other 'small' alphabets easily handled. • We/I (typical monolingual American) can't hardly appreciate the challenge: • two Chinese (kanji) character sets: modern (China) and traditional (Taiwan + most of the Chinese diaspora) • 'ruby': symbols 'over' ideographs
http://www.cs.tut.fi/~jkorpela/chars.html#code character repertoire: A set of distinct characters. character code: A mapping, often presented in tabular form, which defines a one-to-one correspondence between characters in a character repertoire and a set of nonnegative integers. character encoding: A method (algorithm) for presenting characters in digital form by mapping sequences of code numbers of characters into sequences of octets. In the simplest case, each character is mapped to an integer in the range 0 - 255 according to a character code and these are used as such as octets. Naturally, this only works for character repertoires with at most 256 characters. For larger sets, more complicated encodings are needed. Encodings have names, which can be registered.
charset Using the terms just defined, the charset attribute in an HTML meta tag means encoding <meta http-equiv="Content-Type" content= "text/html;charset=utf-8" /> <meta http-equiv="Content-Type" content= "text/html;charset=ISO-8859-1" />
Language • Attribute of html tag <html lang="en-us"> MAY be used by browsers (spell-check, hyphenation, speech synthesizers), search engines, other tools. See two-letter codes: www.w3c.org/WAI/ER/IG/ert/iso639.htm
… more • A glyph is a presentation of a particular shape which a character may have when rendered or displayed. • speak of same glyph in italic, bold, etc. • A repertoire of glyphs comprises a font. In a more technical sense, as the implementation of a font, a font is a numbered set of glyphs. The numbers correspond to code positions of the characters (presented by the glyphs). Thus, a font in that sense is character code dependent. An expression like "Unicode font" refers to such issues and does not imply that the font contains glyphs for all Unicode characters.
Examples • ASCII is a character repertoire, code and encoding. Note: confusion about 7 vs 8 bit ASCII • ISO Latin 1 alias ISO 8859-1 standard defines a repertoire, code and encoding of which ASCII is a subset. ISO 8859 is a family of many encodings, indicated by the –n. ISO 8859-5 handles Cyrillic.
Unicode … provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. This is the goal. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.
Note • Unicode goal is universal coverage… • Unicode is product of a consortium of 'mostly US companies'. • Some controversy in its treatment of things • Combining certain kanji characters
Unicode consortium • Go to http://www.unicode.org/unicode/standard/WhatIsUnicode.html • Examine the Translations on the left. See what language characters do not appear on your computer. • Select one and • Go to Display Problems and see if you can fix it.
XML progress • XML 1.0 to XML 1.1 • Issue: complaint that new standard had features to suit IBM • The IBM-specific problem that XML 1.1 aims to fix has to do with a special character that designates to IBM mainframe systems the end of a line of text. XML 1.0 chokes on that character, but version 1.1 would recognize it. • ZDNet News: http://zdnet.com.com/2100-1104-962392.html
Techniques • One web site / screen provide options to go to different pages • use symbols/icons that are meaningful to audience • tricky. Flags may not be appropriate. • use images containing text in the specific language • risky choice: hope that computer/platform/browser has character encoding and font to display language • poor choice: use English word for other language. http://www.lionbridge.com/ Example of company/site supporting 'global reach'.
quiz What is the word in that language for • Spanish • Chinese (Mandarin? Hainese?) • Korean • Japanese • Hebrew • Russian • French • Finnish • Arabic (Classical?, ?) • Hindi (Urdu?, ?) What is the direction of text? What is the format for dates? Time? Money?, relevant cultural issues?
Homework • Next: Accessibility discussion, exercises • Prepare • download Instant Saxon: standalone translator for xml and xslt. • download Nokia Mobile Internet Toolkit. Need to register (no costs). • register with studio.tellme.com