1 / 21

Web Site Internationalization or Furansugo no dekiru mono wa imasen ka?

Web Site Internationalization or Furansugo no dekiru mono wa imasen ka?. Instructor: Joseph DiVerdi, Ph.D., M.B.A. Core Issue.

ryann
Download Presentation

Web Site Internationalization or Furansugo no dekiru mono wa imasen ka?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Site Internationalization or Furansugo no dekiru mono wa imasen ka? Instructor: Joseph DiVerdi, Ph.D., M.B.A.

  2. Core Issue • "If the WWW is to reach a truly worldwide audience, it needs to be able to support the display of all the languages of the world, with all their unique alphabets and symbols, directionality, and specialized punctuation. This poses a big challenge to HTML constructs as we know them." • Web Design in a Nutshell

  3. W3C's Efforts • "i18n" project • Spoken: eye-eighteen-en i-nternationalizatio-n • Two Primary Issues Addressed: • Alternate Character Sets • Take into account all writing systems in the world • How to Specify Languages & Their Unique Presentation Requirements • Within an HTML document • Current state-of-art in • RFC-2070 • HTML v4.0

  4. Character Sets • Character • A Unit of a Written Language System ay, bee, see, dee, eff, gee, aych, eye • Glyph • An Actual Printed or Displayed Character = a b c 5 , $ ó

  5. Character Sets • A Character May Associate With Several Glyphs • Close Quote - " or » • A Glyph May Correspond to Several Characters • Comma - pause in sentence or decimal indicator • In Certain Languages

  6. Character Sets • Each Character is Assigned • A Specific Numeric Value • Number of Characters in a Character Set • Limited by the Bit-Depth of its Encoding • 8-Bit Encoded Character Set - 256 characters • 16-Bit Encoded Character Set - 65,536 characters • HTML v2.0 & v3.2 are based on ISO 8859-1 • 8-Bit Character Set • AKA Latin-1

  7. Character Sets • ISO-8859-1 Character Set • 8-Bit Depth • First 128 Values From US-ASCII Numeric Value Glyph Description 13 CR carriage return 48 0 digit zero 64 A uppercase aye 94 ^ caret 177 ± plus-or-minus 191 ¿ inverted question mark 255 ÿ lowercase wye w/umlaut

  8. Character Sets (continued) • Common 8-bit character sets ISO 8859-1 Latin-1 ISO 8859-5 Cyrillic ISO 8859-6 Arabic ISO 8859-7 Greek ISO 8859-8 Hebrew SHIFT_JIS Japanese EUC_JP Japanese

  9. Uses of Character Sets Languages Countries Character Sets French fr iso-8859-1 Greek el iso-8859-7 Hebrew iw iso-8859-8 Hungarian hu iso-8859-2 Icelandic is iso-8859-1 Italian it iso-8859-1 Japanese ja shift_jis, iso-2022-jp, euc-jp Romanian ro iso-8859-2 Russian ru koi-8-r, iso-8859-5 Serbian sr iso-8859-5 Slovak sk iso-8859-2 Spanish es iso-8859-1 Turkish tr iso-8859-9 Ukrainian uk iso-8859-5

  10. Character Sets (continued) • 256 Characters are Sufficient • For Certain Languages • Insufficient for Others • Japanese (kanji) • Chinese • Korean • Vietnamese • Hence the Need For • 16-Bit Encoded Character Sets

  11. Character Sets • 16-Bit Encoded Character Sets • Two Contiguous Bytes Represent One Character • 65,536 Possible Characters in One Set • Unicode is a 16-bit Character Set • Developed by the Unicode Consortium • Practically Identical to ISO 10646-1 • First 256 Slots Allocated to ISO 8859-1 • Backwards Compatible (woo-hoo!)

  12. Specify Character Encoding • Document Character Encoding • Communicated Between Server & Client • Set With <META> tag & http-equiv attribute <META HTTP-EQUIV=CONTENT-TYPE CONTENT="text/html; charset=ISO-8859-1"> • Creates an HTTP header Content-type: text/html; charset=ISO-8859-1 • Required for Successful Validation • Browser Must Support Chosen Character Set • To Display Page Correctly

  13. Character Sets • HTML v4.0 adopts Unicode as its Document Character Set • v4.0 Browser Behavior: • Regardless of Document Creation Encoding • Browser converts characters to internal format • Interprets characters with HTML meaning, e.g., <> • Converts character entities, e.g., &#169; • Where character entity points outside Latin-1 character set • &#982 for Pi, • Uses Unicode character to display correct character

  14. Character Sets (continued) • Issues • Larger Data Transfers • Slower Processing • If it's Necessary • Just Do It

  15. v4.0 Language Tags • LANG attribute • Used Within Text Elements • To Switch to Other Languages Within a Document • Add to <HTML> Tag to Specify Language • For EntireDocument <HTML LANG=fr> • To Turn On Norwegian • For Just One Element <P LANG=no> Something about Lutefisk </P>

  16. Language Codes • Representation of Language Names • See: • Table 27-1 of Web Design in a Nutshell • p 461 http://www.oclc.org/oclc/man/code/lang.htm

  17. Language Codes • What Happens When Language is Specified? • Two General Answers: • Not Much • It Depends • An Individual viewer might configure his or her browser to respond differently to different language specifications • Search engines might respond to language specification • Consider LANG to be Structural Markup • Describe the Structure of the Document

  18. Directionality • Many Languages Read from Right to Left • An International HTML Standard Needs to Take This Into Account • DIR attribute <P DIR=rtl> Left to Right from Read Languages Many </P>

  19. Directionality • Tag in HTML v4.0 to deal with documents containing combinations of left-and right-reading text • Aka bi-directional text or Bidi • <BDO> is used for bi-directional override • Specify a span of text that overrides the intrinsic directions of the text it contains <BDO DIR=ltr>An English phrase in an otherwise Hebrew text</BDO> • More Structured Markup

  20. Cursive Joining • A Character's Shape Can Vary • Depending on its Position in a Word • In Some Writing Systems • In Arabic • Certain Characters Look Completely Different • When Used at the Beginning of a Word or • When Used as the Last Character of a Word • Also True of Many Other Languages

  21. Cursive Joining • Unicode Characters Exist • Are Placed Between Characters • Which Have Zero Width • They Don't Appear in the Browser's Window • Act Purely as instructions • To Specify Joining of the Neighboring Characters • &zwnj; • zero-width non-joiner • Prevents Joining of Characters Which Normally are Joined • &zwj; • zero-width joiner • Joins Characters Which Normally are Not Joined

More Related