200 likes | 650 Views
Unicode. A short introduction. Outline. Writing systems Computers and Alphabets ASCII versus Unicode Examples of Unicode For more information. Writing sytems. Each letter of the alphabet has A name (sometimes several) E.g., ‘a’ is called ‘eh?’ E.g., ‘a’ is also called ‘lower-case a’
E N D
Unicode A short introduction
Outline • Writing systems • Computers and Alphabets • ASCII versus Unicode • Examples of Unicode • For more information
Writing sytems • Each letter of the alphabet has • A name (sometimes several) • E.g., ‘a’ is called ‘eh?’ • E.g., ‘a’ is also called ‘lower-case a’ • A pronunciation • e.g., in Cayuga, ‘a’ usually sounds like’aaah’
Computers and Alphabets • Computers also have names for letters • The letter ‘a’ is called ‘61’ in one ‘language’ • The letter ‘a’ is called ‘U+0061’ in another ‘language’
Computers and Alphabets • Why do computers use numbers for the names for letters? • Because they store all information in number form. • Technical detail: they store information as ‘bytes’; each ‘byte’ consists of 8 ‘bits’; each ‘bit’ is either the number ‘0’ or ‘1’’
Computers and Alphabets • Computer programs ‘translate’ the names into letters on the screen or in print • Result: you see a, a, a, a, a, a, etc. (different fonts, but the same letter)
ASCII versus Unicode • ASCII (American Standard Code for Information Exchange) • ASCII and Unicode are two computer ‘languages’ for naming letters • The ASCII name for ‘a’ is ‘61’ • The Unicode name for ‘a’ is ‘U+0061’
ASCII • Computer systems can represent up to 256 letters • Technical detail: with one 8-bit byte (28 = 256) • Another technical detail: ASCII only uses 7 bits (27 = 128) • The first 32-127 are called ASCII letters (characters)
ASCII • On all computers, the ASCII letters named 32-127 look the same. • E.g., ‘35’ looks like upper-case ‘A’ on all modern computers • Technical detail: why only up to 127? Excluding 0, that’s all you can represent with 27 combinations of bits) • Another technical detail: what happened to 1-32? That’s for control characters like the ‘option’ key.
ASCII Problem • While computers can represent 256 names for letters, no one agrees on what letters the numbers 128-256 stand for. • That’s why your Mac Cayuga font shows up as gibberish on a Windows computer. • The number ‘250’ doesn’t mean the same thing on a Mac as it does on a Windows PC.
Unicode: fixing the ASCII problem • Unicode aims to provide a unique name for every letter ever used…on the planet. • It has room for 1,000,000 names. • Everyone agrees on what letters the names stand for. • Technical details not discussed here: getting from names like ‘U+0061’ to the letter ‘a’ on your computer.
Unicode • Many letters have already been given an Unicode name. • Modern computers can display any letter that has a Unicode name.
Unicode and Syllabics • The Cherokee syllabary is represented in the Unicode character block U+13A0 - U+13FF. • Cherokee letter representing the syllable ‘tay’ Ꮦ
Unicode and Syllabics • Unified Canadian Aboriginal Syllabics are represented in the Unicode character block U+1400 - U+167F. • The ‘ee’ sound in most Canadian syllabics systems: ᐃ
Unicode and Cayuga • There’s no special character block for Cayuga • That’s because all the Cayuga characters can be made up from already existing Unicode characters • the Unicode Consortium won’t let you duplicate already existing characters.
Unicode and Cayuga • Sgę:nǫ:⁷ swagwé:gǫh
Advantages of Unicode • (Not quite yet, but in the near future) when you type in Cayuga, it will appear as Cayuga on any other computer. • The same goes for web pages…
For More Information • Lots of technical details not discussed here. • Take one of the CDs provided if you want a more extensive introduction.