230 likes | 351 Views
Lecture 01 Data Representation 1 Bits, Bytes, and Characters. Data Representation Overview. What is data? Bits, bytes, words, characters and integers. Number systems — denary, binary, octal, hex. Conversions between number systems. Representation of negative numbers.
E N D
Data Representation Overview • What is data? • Bits, bytes, words, characters and integers. • Number systems — denary, binary, octal, hex. • Conversions between number systems. • Representation of negative numbers. • Binary addition and subtraction. • Representation of real numbers.
What Is Data? • Corresponds to discrete facts about phenomena from which we gain information about the world. • The concept of value is fundamental to any kind of data. • Some values are: 25, $255.50, -50, April, 'This is a sentence.’, red, A#.
What Is Data (2)? • BUT, values are abstract. They exist only in the mind. • Examples: This is the month of February. This is the 58th day of the year. • The same value may be represented in many ways: • 12 twelve XII 1100 • • – – – – – – –
How Is Data Represented In a Computer? • The digital computer is binary. • Everything is represented by one of two states: • 0, 1; on, off; true, false; voltage, no voltage. • Values are represented by sequences of binary digits or bits.
Computer Memory • In general, data being worked on is stored in the computer's memory. • Think of memory as a bunch of slots where we can store stuff. • Each slot (data item) has: • an address (for the computer) • a name (for us).
Computer Memory 0 1 X 2 3 4 CH 5 6 A[1] 1,048,569 1,048,570 A[2] 1,048,571 1,048,572 1,048,573 1,048,574 1,048,575
Bits, Bytes, Words • The smallest unit of storage is the bit. • The bit has a binary value, 0 or 1. • Each (addressable) memory location is made up of 8 bits called a byte. • A computer with 1 MB of memory has 1,048,576 bytes (220). • A computer with 1 GB of memory has 1,073,741,824 bytes (230).
Bits, Bytes, Words (continued) 0 1 X 2 3 4 CH 5 6 A[1] 1,048,569 1,048,570 A[2] 1,048,571 1,048,572 1,048,573 1,048,574 1,048,575 7 0 MSB LSB
A Word of Memory • A byte (8 bits) is typically the smallest addressable unit. • A word is a manufacturer defined constant. • In the past has ranged over several different sizes (6, 8, 12, 16, 18, 24, 32, 36, 48, 64) and probably others. • Nowadays typically 32 or 64.
Characters • A character is the smallest unit of data that people normally handle. • Characters include: • UPPER CASE LETTERS: A C X . . . • lower case letters: b g k . . . • Single digits: 3 1 9 . . . • Special characters: ! < = . . . • Others: <tab>, <eol>, <eof>, …
Characters (continued) • Consider the declaration: private char ch; • ch is an identifier used to identify a location in the computer's memory. • During compilation, the name is trans-lated into the actual address of the memory location where the data item is stored.
ASCII character code • How does the computer know that 'A' is 'A' and not 'a' or '3' or anything else? • Actually, it doesn’t — there is no inherent difference between the patterns for 65 and ‘A’ (or even ‘A’, but that is another story). • As with any other symbols, it depends on interpretation and context.
ASCII character code • American Standard Code for Information Interchange. • Based on 7 bits (Why?). • For example: • A 100 0001 0 011 0000 • a 110 0001 ! 010 0001 • Also defines non-printing characters such as line feed, form feed, horizontal tab, etc.
ASCII character code (continued) • There are 128 (27) possible characters, stored in memory as a byte with the 8th bit usually 0. • Characters can be compared for sorting purposes based on the numeric value of the code. Often called the collating sequence.
Other Character Codes • BCD — Binary coded decimal. • 6-bits + check bit. • Used for some magnetic tapes. • EBCDIC – Extended Binary Coded Decimal Interchange Code. • Used with large IBM computers. • BAUDOT • Paper tape code • 80 column cards. • 12-bit code. Many patterns were unused.
But what about …? • Other writing systems? • Greek, Arabic, Chinese, Hebrew, . . . • Graphics? • astrology, Zingbats, international . . . • Characters that represent ideas, concepts, and phrases?
Unicode • Unicode was developed by The Unicode Consortium and is now coordinated with ISO/IEC 10646. • 16-bit code, so 65,535 possible characters. • Some of these are reserved.
Unicode (continued) • Defines codes for characters used in the major languages written today. • Scripts include: • Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Georgian, Tibetan, Japanese Kana.
Unicode (continued) • Also • The complete set of modern Korean Hangal, • a unified set of Chinese/Japanese/Korean (CJK) ideographs. • Scripts and characters added recently or to be added shortly include: • Ethiopic, Canadian Syllabics, Cherokee, additional rare ideographs, Sinhala, Syriac, Burmese, Khmer, and Braille.
Unicode (continued) • Includes extensions to allow for millions of characters. • Can handle composite characters such as those with accent marks, dots, double dots, macrons, etc.
Something to think about • How are character strings stored in the computer?
Next Lecture Data Representation 2 Numbers and Number Systems