190 likes | 275 Views
CIS 234: Character Codes. Dr. Ralph D. Westfall April, 2011. Problem 1 (other PowerPoint ). computers only understand binary coded data (zeros and ones) 00000000, 11111111, 01010101 people like to count in decimals 00000000=0, 11111111=255, 01010101=85
E N D
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011
Problem 1 (other PowerPoint) • computers only understand binary coded data (zeros and ones) • 00000000, 11111111, 01010101 • people like to count in decimals 00000000=0, 11111111=255, 01010101=85 • 1st problem: it is extremely hard for people to work with binary data
Problems 2a and 2b • since computers only work with numbers, they need to use numbers to identify letters to print or show on screen e.g., 01000001=65=A • people who don't read English also use computers • next problem: what kind of numbering should be used for different languages?
Problem 2 Solution • using binary data to display characters • make up a "coding scheme" that assigns characters to numbers • ASCII code: 7-8 bits (1 byte) • Unicode: 16 bits (2 bytes)
ASCII Code • used for teletypes before computers • 128 characters in original ASCII • 0 to 31 (decimal) control the machine 7 (BEL) rings bell 8 (BS) backspace key 10 (LF) line feed (go down 1 line) 13 (CR) carriage return (to left of page) Java: '\n' = 10 and 13 together (2 bytes)
ASCII Characters • A = 41 hex (65 decimal), Z = 5A h (90) • a = 61 hex (97 decimal), z = 7A h (122) • see calculator (String or ASCII choices) • space character = 20 hex (32 decimal) • see how space character code is used in browser Address textbox • ; (semicolon) = 3B hex (59 decimal)
Printable ASCII Characters (space) ASCII mage is from Wikipedia
ASCII Numbers • codes are for characters on screen and do NOT equal the values of the characters • Code numeric values can NOT be used in calculations without adjustments 0 = 30 hex (ASCII 0 is really 48 decimal) 9 = 39 hex (57 decimal)
Unicode • ASCII is a 7-8 bit encoding scheme • 128-256 character limit • Unicode is a 16-bit scheme • Uni comes from the word universal (also from Unix) • can code 65,536 characters (actually more) • Java uses Unicode encoding so that it can be used for many different languages
Unicode - 2 • Unicode characters for many languages • Western alphabets: Latin (English), Greek, Cyrillic (Russian), etc. • Unicode uses 0000000 + ASCII for English • 00000000 01000001 = A (65 decimal) • Asian characters: CJK (Chinese, Japanese, Korean) has over 20,000 characters • many character systems require installing special fonts onto user's computer
Using Unicode in Java char letter = 'A' ; //easiest way char letter = '\u0041' ; // also = 'A' char letter = '\u3220' ; // or '\u3280' ; // 1 Chinese character for 1 • \ (backslash) = escape character • \u means Unicode (#s are in hexadecimal) char sound = '\u0007' ; // BEL • sounds speakers when "printed" to screen
Review Questions • How many bits are there in ASCII code? • How many bits are there in Unicode? • True or False: All ASCII codes can be seen as characters on the screen • How many characters can be printed using ASCII? Using Unicode? (match 2) • around 90, around 12,000, over 50,000
Review Questions - 2 • Why was Unicode created to handle over 50,000 characters? • Give an example of what some non-printable ASCII character does on a computer or screen • How does Java code need to handle calculations on numeric characters entered on the screen by the user
Review Questions - 3 • Is a space a character? • What is the Chinese character for the number 1? 2? 3? • this will NOT be on a test! • see answers on next slide
Appendix • the following slides show how ASCII characters can be read from the keyboard and converted to values that can be used for mathematical calculations
Reading Characters in DOS int iInit = System.in.read() ; • gets numeric value of character it reads • if character is A, iInit = 65 (decimal) char cInit = (char) System.in.read() ; • (char) "casts" (converts) numeric value to character type System.out.println(iInit) ; //number System.out.println(cInit) ; //character
Reading Characters in Java - 2 • 2 characters sent when hit Enter key CR (13) and then LF (10 decimal) • when accepting keyboard input from DOS window in Java, need to "absorb" both characters from Enter keystroke System.in.read(); System.in.read(); • reads characters, doesn't store (=) them • program is now ready to read next input
Using Characters for Math • numbers (characters) read from keyboard have numeric values • need to convert character's decimal value to its mathematical value • 0 = 30 h (48 decimal), 9 = 39 h (57) • math value = decimal value – 48 int quantity = System.in.read() – 48 ; code // notes