Data Representation

Data Representation Winter 2013 COMP 2130 Intro Computer Systems Computing Science Thompson Rivers University

Course Objectives • The better knowledge of computer systems, the better programing. Data Representation

Course Contents • Introduction to computer systems: B&O 1 • Introduction to C programming: K&R 1 – 4 • Data representations: B&O 2.1 – 2.4 • C: advanced topics: K&R 5.1 – 5.10, 6 – 7 • Introduction to IA32 (Intel Architecture 32): B&O 3.1 – 3.8, 3.13 • Compiling, linking, loading, and executing: B&O 7 (except 7.12) • Dynamic memory management – Heap: B&O 9.9.1 – 9.9.2, 9.9.4 – 9.9.5, 9.11 • Code optimization: B&O 5.1 – 5.6, 5.13 • Memory hierarchy, locality, caching: B&O 5.12, 6.1 – 6.3, 6.4.1 – 6.4.2, 6.5, 6.6.2 – 6.6.3, 6.7 • Virtual memory (if time permits): B&O 9.4 – 9.5 Data Representation

Unit Learning Objectives • Convert a decimal number to binary number. • Convert a decimal number to hexadecimal number. • Convert a binary number to decimal number. • Convert a binary number to hexadecimal number. • Convert a hexadecimal number to binary number . • Distinguish little endian byte order and big endian byte order. • Compute binary addition. • Compute binary subtraction using 2’s complement. • Determine the 2’s complement representation of a signed integer. • Understand the overflow of unsigned integers and signed integers. • Trace and fix faulty code. Data Representation

Add two binary numbers. • Compute the 1’s complement of a binary number. • Compute the 2’s complement of a binary number. • Understand the 2’s complement representation for negative integers. • Subtract a binary number by using the 2’s complement addition. • Multiply two binary numbers. • Use of left shift and right shift. • Binary division Data Representation

Unit Contents • Information Storage • Integer Representations • Integer Arithmetic • Floating Point Data Representation

1. Information Storage • Virtual memory, address, and virtual address space • Virtual address space is a conceptual image presented to the machine-level program. • Partitioned into more manageable units to store the different program objects, i.e., instructions, program data, and control information. • The actual machine level program simply treats each program object as a block of bytes, and the program itself as a sequence of bytes. • Example • int number = 28; • 4 bytes, i.e., 32 bits, will be allocated to the variable number. • The decimal number 28 will be stored in the 32 bits? How? • Binary number, not the decimal, 00000000 00000000 00000000 00011100 will be stored in the 32 bits. • Do programmers have to convert 28 to it’s binary number? Data Representation

The Decimal System • Uses 10 digits 0, 1, 2, ..., 9. • Decimal expansion: • 83 = 8×10 + 3 • 4728 = 4×103 + 7×102 + 2×101 + 8×100 • 84037 = ??? • 43.087 = ??? • Do you know addition, subtraction, multiplication and division? • 1234 + 435.78 • 1234 – 435.78 • 1234 × 435.78 • 1234 / 435.78 Number Systems

The Binary System • In computer systems, the most basic memory unit is a bit that contains 0 and 1. • The data unit of 8 bits is referred as a byte that is the basic memory unit used in main memories and hard disks. • All data are represented by using binary numbers. Data types such as text, voice, image and video have no meaning in the data representation. • 8 bits are usually used to express English alphabets. • A collection of nbits has 2npossible states. Is it true? • E.g., • How many different numbers can you express using 2 bits? • How many different numbers can you express using 4 bits? • How many different numbers can you express using 8 bits? • How many different numbers can you express using 32 bits? Number Systems

How can we store integers(i.e., positive numbers only) in a computer? • E.g., • A decimal number 329? • Is it okay to store 3 characters ‘3’, ‘2’, and ‘9’ for 329? • How are characters stored? • 32910 = ???2 Number Systems

Uses two digits 0 and 1. How to expand binary numbers? • 02 = 0×20 = 010 • 12 = 1×20 = 110 • 102 = 1×21 + 0×20 = 210 • 112 = 1×21 + 1×20 = 310 • 1002 = 1×22 + 0×21 + 0×20 = 410 • 1012 = 1×22 + 0×21 + 1×20 = 510 • 1102 = 1×22 + 1×21 + 0×20 = 610 • 1112 = 1×22 + 1×21 + 1×20 = 710 • 10002 = 1×23 + 0×22 + 0×21 + 0×20 = 810 • 10012 = 1×23 + 0×22 + 0×21 + 1×20 = 910 • ... Number Systems

Powers of 2 • 12 = 1×20 = 110 • 102 = 1×21 + 0×20 = 210 • 1002 = 1×22 + 0×21 + 0×20 = 410 • 10002 = 1×23 + 0×22 + 0×21 + 0×20 = 810 • 1 00002 = 1610 • 10 00002 = 3210 • 100 00002 = 6410 Number Systems

1000 00002 = ???10 • 1 0000 00002 = ???10 • 10 0000 00002 = ???10 • 100 0000 00002 = ???10 • 1000 0000 00002 = ???10 • 1 0000 0000 00002 = ???10 • Can you memorize the above powers of 2? • Converting to decimals • 11012 = ???10 • 1011 00102 = ???10 • 1011.00102 = ???10 Number Systems

Converting Decimal to Binary Quotient Remainder • 21 / 2 10 1 10 / 2 5 0 5 / 2 2 1 2 / 2 1 0 1 / 2 0 1 => 2110 = 1 01012 • 27110 = ???2 • 607110 = ???2 Number Systems

Another similar idea • 27110 = ???2 256 < 271 < 512 -> 271 = 256 + 15 = 1 0000 00002 + 15 8< 15 < 16 -> 15 = 8 + 7 = 10002 + 7 => 271 = 1 0000 00002 + 15 = 1 0000 00002 + 10002 + 7 = 1 0000 00002 + 10002 + 1112 = 1 0000 11112 • 127110 = ???2 Number Systems

Hexadecimal Number System • 010 = 00002 = 016 = 0x0 • 110 = 00012 = 116 = 0x1 • 210 = 00102 = 216 = 0x2 • 310 = 00112 = 316 = 0x3 • 410 = 01002 = 416 = 0x4 • 510 = 01012 = 516 = 0x5 • 610 = 01102 = 616 = 0x6 • 710 = 01112 = 716 = 0x7 • 810 = 10002 = 816 = 0x8 • 910 = 10012 = 916 = 0x9 • 1010 = 10102 = A16 = 0xA • 1110 = 10112 = B16 = 0xB • 1210 = 11002 = C16 = 0xC • 1310 = 11012 = D16 = 0xD • 1410 = 11102 = E16 = 0xE • 1510 = 11112 = F16 = 0xF • 14816= ???1014816= ???2 • 23c9d6ef = ???1023c9d6ef = ???2 4 bitscan be used for a hexadecimal number, 0, ..., F. Please memorize it! Number Systems

Converting Decimal to Hexadecimal Quotient Remainder • 328 / 16 = 20 × 16 + 8 20 / 16 = 1 × 16 + 4 1 / 16 = 0 × 16 + 1 => 32810 = 14816 = ???2 • 14816 = (1 × 162 + 4 × 161 + 8 × 160)10 • 19210 = ???16 Number Systems

Converting Binary to Hexadecimal • 4DA916= ???2 • 1001101101010012 = ???16 = 100 1101 1010 1001 = 4DA9 • 10 11102 = 0x??? = ???10 • 0100 1110 1011 1001 01002 = 0x??? = ???10 Number Systems

Format specifiers in printf() for hexadecimal and decimal numbers? for (i = 0; i < num; i++) printf(“%d = 0x%x\n”, data[i], data[i]); • But there is no printf format specifier to print an integer in the binary form. • Write a function to make a string of 0’s and 1’s for a given integer. Data Representation

Words • A word size – the nominal size of integer and pointer data. A pointer variable contains an address in the virtual address space. We will discuss pointer variable in the next unit. • The word size determines the maximum size of the virtual address space. • 32bit operating systems? • 64bit operating systems? Data Representation

Data Sizes • printf(“%lu\n”, sizeof(long)); //lu: long unsigned // integer Data Representation

Addressing and Byte Ordering • A variable x of type int • Address of x: 0x100 • This means the 4 bytes of x would be stored in memory locations 0x100, 0x101, 0x102, and 0x103. • How to interpret the bytes in memory locations 0x100, 0x101, 0x102, and 0x103? • Let’s assume x has 0x1234567. There are two conventions to store the values in the 4 consecutive byte memory locations. 0x01, 0x23, 0x45, and 0x67, or 0x67, 0x45, 0x23, and 0x01, depending on CPU architecture. • Little endian byte order – Intel-compatible machines 0x103 0x102 0x101 0x100 address 0x01 0x23 0x45 0x67 value • Big endian byte order – machines from IBM and Sun Microsystems 0x103 0x102 0x101 0x100 0x67 0x45 0x23 0x01 Data Representation

The byte orderings are totally invisible for most application programmers. • Why are the byte orderings important? • Think of data exchange between two machines through a network. • Assembly programming • When type casting is used Data Representation

Representing Strings • A string is encoded by an array of characters terminated by the null (having 0) character ‘\0’; • The ASCII character set • Unicode – Some libraries are available for C. Data Representation

Representing Code • Different machine types use different and incompatible instructions and encodings. • Even identical processors running different OSes have differences in their coding conventions and hence are no binary compatible. Data Representation

Comparison and Logical Operations • Comparison operators ??? • Logical operators ??? Data Representation

Bit-Level Operations • ??? • &, |, ~, ^, >>, << • >>: Logical right shift and arithmetic right shift. • Logical right shit for unsigned integers: filled with 0 • Arithmetic right shit for signed integers: filled with the MSB • Examples char x = -128; unsigned char y = 128; x = x >> 1; y = y >> 1; printf (“%d, %d\n”, x, y); • We will discuss about this example later again. • Some examples a ^ a = ??? a ^ 0 = ??? x = 10; y = 20; y = x ^ y; x = x ^ y; y = x ^ y; Data Representation

Some more examples 11110000 & 11001100 = ??? 11110000 | 11001100 = ??? 11110000 ^ 11001100 = ??? 0x3B & 0x33 = ??? 0x3B | 0x33 = ??? 0x3B ^ 0x33 = ??? 0x3B >> 2 = ??? 0x33 << 2 = ??? • Consult with programming assignments Data Representation

2. Integer Representations C Java Size char, unsigned char byte 1B short, unsigned short short 2Bs char in Java uses 2Bs. int, unsigned int int 4Bs long, unsigned long long 8Bs // there is no unsigned in Java float float 4Bs double double 8Bs Data Representation

Unsigned Encodings • unsigned char All 8 bits are used. No sign bit. • The smallest number is 0 • The maximum number is 0xff. • unsigned short 16 bits • The smallest number is ??? • The maximum number is ??? • unsigned int 32 bits • The smallest number is ??? • The maximum number is ??? • unsigned long 64 bits • The smallest number is ??? • The maximum number is ??? Data Representations

Representation of Unsigned Integers • 8-bit representation of unsigned char 255 11111111 254 11111110 ... ... 128 10000000 127 01111111 126 01111110 ... ... 2 00000010 1 00000001 0 00000000 • The maximum number is ? • The minimum number is ? • What if we add the maximum number by 1 ??? • What if we subtract the minimum number by 1 ??? +1 +1 +1 overflow Data Representations

unsigned char x, y; x = 128; y = 128; printf(“x = %d, y = %d\n”, x, y); printf(“x + y = %d\n”, x + y); // int (not char) addition x = x + y; // 256 -> 100000000 -> truncation printf(“x = %d, y = %d\n”, x, y); X = 128; x = x >> 1; // logical right shift for unsigned printf(“x = %d\n”, x); • The output is ??? • 128, 128 256 0, 128 64 • 16-bit, 32-bit, 64-bit representations have the same overflowproblem. • How to represent signed integers? Data Representation

Binary Addition • We will discuss binary addition and binary subtraction, before we discuss the representation of signed integers. • How to add two binary numbers? Let’s consider only unsigned integers(i.e., positive numbers only) for a while. • Just like the addition of two decimal numbers. • E.g., 10010 10010 1111 + 1001 + 1011 + 1 11011 11101 ??? 10111 + 111 ??? carry Data Representations

Binary Subtraction • How to subtract a binary number? • Just like the subtraction of decimal numbers. • E.g., 0112 02 02 1000 10 10 10010 10010 10010 -1-1 -11 -11 -11 1 ?1 ?11 1111 Try: 101010 How to do? 1 -101-10 Data Representations

In the previous slide, 10010 – 11 = 1111 • What if we add 00010010 + 11111100 1 00001110 + 1 00001111 • Is there any relationship between 112 and 111002? • The 1’s complement of 112 is ??? Switching 0  1 • This type of addition is called 1’s complement addition. • Find the 8-bit one’s complements of the followings. • 11011 -> 00011011 -> • 10 -> 00000010 -> • 101 -> 00000101 -> Data Representations

In the previous slide, 10010 – 11 = 1111 • What if we add 00010010 + 11111101 1 00001111 • Is there any relationship between 11 and 11101? • The 2’s complement of 11 is ??? • 2’s complement ≡ 1’s complement + 1 -> 11100 + 1 = 11101 • This type of addition is called 2’s complement addition. • Find the 16-bit two’s complements of the followings. • 11011 -> 0000000000011011 -> • 10 • 101 Data Representations

Another example 101010 - 101 ??? • What if we use 1’s complement addition or 2’s complement addition instead as follow? Let’s use 8-bit representation. 00101010 00101010 + 11111010+ 11111011 1 00100100 1 00100101 + 1 00100101 • What does this mean? • A – B = A + (–B), where A and B are positive • Is the 1’s complement or the 2’s complement of B sort of equal to –B? Data Representations

Can we use 8-bit 1’s complement addition for 12 – 102 = –12? 1 00000001 - 10+ 11111101<- 8-bit 1’s complement of 10 11111110 <- Is this correct? (1’s complement of 1?) • Let’s use 8-bit 2’s complement addition for 12 – 102. 00000001 +11111110<- 2’s complement of 10 11111111 <- Correct? (2’s complement of 1?) • 12 – 102 = 12 + (–102) = –12 • How to represent negative binary numbers, i.e., signed integers? Data Representations

Representation of Negative Binaries • Representation of signed integers • 8 or 16 or 32 bits are usually used for integers. • Let’s use 8 bits for examples. • The left most bit (called most significant bit) is used as sign. • When the MSB is 0, non-negative integers. • When the MSB is 1, negative integers. • The other 7 bits are used for integers. • How to represent positive integer 9? • 00001001 • How about -9? • 10001001 is really okay? • 00001001 (9) + 10001001 (-9) = 10010010 (-18) It is wrong! • We need a different representation for negative integers. Data Representations

How about -9? • 10001001 is really okay? • 00001001 (9) + 10001001 (-9) = 10010010 (-18) It is wrong! • We need a different representation for negative integers. • What is the 8-bit 1’s complement of 9? • 11110110 <- 8-bit 1’s complement of 9 • 00001001 + 11110110 <- 9 + 8-bit 1’s complement of 9 = 11111111 <- Is it zero? (1’s complement of 0?) • What is the 2’s complement of 9? • 11110111 <- 8-bit 2’s complement of 9 • 00001001 + 11110111 <- 9 + 8-bit 2’s complement of 9 = 1 00000000 <- It looks like zero. • 2’s complement representation is used for negative integers. Data Representations

12 – 102 = 12 + (–102) ??? What is the result in decimal? 00000001 + 11111110<- 2’s complement of 10, i.e., -102 11111111 <- 2’s complement of 1, i.e., -1 (= 1 – 2) • 1010102 – 1012 = 1010102 + (–1012) ??? • 100102 – 112 ??? • 102 – 12 ??? • -102 – 12 ??? • Is the two’s complement of the two’s complement of an integer the same integer? • What is x when the 8-bit 2’s complement of x is • 11111111 11110011 10000001 Data Representations

8-bit representation of signed char with 2’s complement 127 01111111 126 01111110 ... ... 2 00000010 1 00000001 0 00000000 -1 11111111 -2 11111110 -3 11111101 ... ... -127 10000001 -128 10000000 • The maximum number is ? • The minimum number is ? • What if we add the maximum number by 1 ??? • What if we subtract the minimum number by 1 ??? overflow +1 +1 -1 overflow +1 -1 -1 Data Representations

16-bit representation signed short with 2’s complement ... 01111111 11111111 ... 01111111 11111110 ... ... 3 00000000 00000011 2 00000000 00000010 1 00000000 00000001 0 00000000 00000000 -1 11111111 11111111 -2 11111111 11111110 -3 11111111 11111101 ... ... ... 10000000 00000001 ... 10000000 00000000 • The maximum number is ? What if we add the maximum number by 1 ??? • The minimum number is ? What if we subtract the minimum number by 1 ??? overflow +1 +1 +1 -1 -1 -1 overflow Data Representations

Note that computers use the 8-bit representation, the 16-bit representation, the 32-bit representation and the 64-bit representation with 2’complement for negative integers. • In programming languages • char, unsigned char 8-bits • short, unsigned short 16-bits • int, unsigned int 32-bits • long, unsigned long 64-bits • When we use the 32-bit representation with 2’s complement, • The maximum number is ? • What if we add the maximum number by 1 ??? • The minimum number is ? • What if we subtract the minimum number by 1 ??? Data Representations

Now we know how to represent negative integers. • 2’ complement addtion A + (–B) is computed for subtraction A – B. • Let’s suppose B is negative. Then –B is really a positive integer? For example, let’s consider 1 byte signed integer. 127 01111111 126 01111110 ... ... 2 00000010 1 00000001 0 00000000 -1 11111111 -2 11111110 -3 11111101 2’s complement of -3 is 00000011, i.e., 3. ... ... -127 10000001 -128 10000000 2’s complement of -128 is 10000000 again. • For any -127 < x < 127, x – x = 0. But (-128) – (-128) = ??? Data Representation

char x, y; x = 128; // 128 -> 10000000 This is -128. y = 128; printf(“x = %d, y = %d\n”, x, y); printf(“x – y = %d\n”, x - y); x = x – y; printf(“x = %d, y = %d\n”, x, y); x = 128; x = x >> 1; // arithmetic shift for signed printf(“x = %d\n”, x); • The output is ??? • -128, -128 0 0, -128 -64 • -128 – (-128) = -128 + (-(-128)) = -128 + (-128) = 10000000 + 10000000 = 00000000 = 0 Data Representation

char a = 127, b = 127, c; unsigned char d; c = a + b; d = a + b; printf(“a=%d, b=%d, c=%d, d=%d\n”, a, b, c, d); • The output is ??? • a=127, b=127, c=-2, d=254 01111111 + 01111111 = 11111110 ??? • We have to be very careful with overflow for integer values. Data Representation

Advice on Signed vs. Unsigned • Practice Problem 2.25 float sum_elements (float a[], unsigned int length) { int i; float result = 0; for (i = 0; i <= length-1; i++) result += a[i]; return result; } • When run with length equal to 0, this code should return 0.0. Instead it encounters a memory error. Why??? • unsigned int length length-1 • How to fix this code? Data Representations

short x = -5; unsigned short y = 128; printf(“%x, %x\n”, x, y); printf(“%x, %x, %x, %x\n”, x<<3, x>>3, y<<3, y>>3); The output is ??? fffffffb, 80 ffffffd8, ffffffff, 400, 10 • Singed integers: • Arithmetic right shift: Left part is filled with the most significant bit of the original value. • Unsigned integers: • Logical right shift: Left part is filled with 0. Data Representations

Practice Problem 2.26 size_t strlen(const char* s); // defined in string.h int strlonger(char s[], char t[]) { return strlen(s) – strlen(t) > 0; } Note that size_t is defined in stdio.h to be unsigned int. • For what cases will this function produce an incorrect result? • 0 – 1 > 0 -> 1 (true value) • Explain how this incorrect result comes about. • Unsigned int of 0 – 1 is 0xffffffff that is greater than 0. • Show how to fix the code so that it will work reliably. Data Representations

Data Representation