Data Representation in Memory CSCI 224 / ECE 317: Computer Architecture

Data Representation in MemoryCSCI 224 / ECE 317: Computer Architecture Instructor: Prof. Jason Fritts Slides adapted from Bryant & O’Hallaron’s slides

Data Representation in Memory • Basic memory organization • Bits & Bytes – basic units of Storage in computers • Representing information in binary and hexadecimal • Representing Integers • Unsigned integers • Signed integers • Representing Text • Representing Pointers

0x00•••0 0xFF•••F • • • Byte-Oriented Memory Organization • Byte-Addressable Memory • Conceptually a very large array of bytes • Each byte has a unique address • Processor width determines address range: • 32-bit processor has 232 unique addresses: 4GB max • 0x00000000 to 0xffffffff • 64-bit processor has 264unique addresses: ~ 1.8x1019 bytes max • 0x0000000000000000 to 0xffffffffffffffff • Memory implemented with hierarchy of different memory types • OS provides virtual address space private to particular “process” • virtual memory to be discussed later…

0 1 0 3.3V 2.8V 0.5V 0.0V Representing Voltage Levels as Binary Values • Digital transistors operate in high and low voltage ranges • Voltage Range dictates Binary Value on wire • high voltage range (e.g. 2.8V to 3.3V) is a logic 1 • low voltage range (e.g. 0.0V to 0.5V) is a logic 0 • voltages in between are indefinite values

Bits & Bytes • Transistors have two states, so computers use bits • “bit” is a base-2 digit • {L, H} => {0, 1} • Single bit offers limited range, so grouped in bytes • 1 byte = 8 bits • a single datum may use multiple bytes • Data representation 101: • Given N bits, can represent 2N unique values

Encoding Byte Values • Processors generally use multiples of Bytes • common sizes: 1, 2, 4, 8, or 16 bytes • Intel data names: • Byte 1 byte (8 bits) • Word 2 bytes(16 bits) • Double word 4 bytes (32 bits) • Quad word 8 bytes(64 bits) Unfortunately, most processor architectures call 2 bytes a ‘halfword’, 4 bytes a ‘word’, etc., so we’ll often use C data names instead (but these vary in size too… /sigh)

C Data Types 64-bit 32-bit key differences

Encoding Byte Values • 1 Byte = 8 bits • Binary: 000000002 to 111111112 • A byte value can be interpreted in many ways! • depends upon how it’s used • For example, consider byte with: 010101012 • as text: ‘U’ • as integer: 11510 • as IA32 instruction: pushl %ebp • part of an address or real number • a medium gray pixel in a gray-scale image • could be interpreted MANY other ways…

Encoding Byte Values • Different syntaxes for a byte • Binary: 000000002 to 111111112 • Decimal: 010to 25510 • Hexadecimal: 0016to FF16 • Base-16 number representation • Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ • in C/C++ programming languages, D316 written as either • 0xD3 • 0xd3

Decimal vs Binary vs Hexadecimal

Binary and Hexadecimal Equivalence • Problem with binary – Hard to gauge size of binary numbers • e.g. approx. how big is: 10100111010100010111010112 ? • Would be nice if native computer base was decimal: 21,930,731 • but a decimal digit requires 3.322 bits… won’t work • Need a larger base equivalent to binary, such that • for equivalence, R and x must be integers – then 1 digit in R equals x bits • equivalence allows direct conversion between representations • two options closest to decimal: • octal: • hexadecimal:

Binary and Hexadecimal Equivalence • Octal or Hexadecimal? • binary : 10100111010100010111010112 • octal: 1235213538 • hexadecimal number: 14EA2EB16 • decimal: 21930731 • Octal a little closer in size to decimal, BUT… • How many base-R digits per byte? • Octal: 8/3 = 2.67 octal digits per byte -- BAD • Hex: 8/4 = 2 hex digits per byte -- GOOD Hexadecimal wins: 1 hex digit  4 bits

Convert Between Binary and Hex • Convert Hexadecimal to Binary • Simply replace each hex digit with its equivalent 4-bit binary sequence • Example: 6 D 1 9 F 3 C16 • Convert Binary to Hexadecimal • Starting from the radix point, replace each sequence of 4 bits with the equivalent hexadecimal digit • Example: 1011001000110101110101100010100112 0110 1101 0001 1001 1111 0011 11002 1 6 4 D B A D 5 316

Unsigned Integers – Binary • Computers store Unsigned Integer numbers in Binary (base-2) • Binary numbers use place valuation notation, just like decimal • Decimal value of n-bit unsigned binary number: 27 25 26 24 23 22 21 20

Unsigned Integers – Base-R • Convert Base-R to Decimal • Place value notation can similarly determine decimal value of any base, R • Decimal value of n-digit base r number: • Example:

Unsigned Integers – Hexadecimal • Commonly used for converting hexadecimal numbers • Hexadecimal number is an “equivalent” representation to binary, so often need to determine decimal value of a hex number • Decimal value for n-digit hexadecimal (base 16) number: • Example:

Unsigned Integers – Convert Decimal to Base-R • Also need to convert decimal numbers to desired base • Algorithm for converting unsigned Decimal to Base-R • Assign decimal number to NUM • Divide NUM by R • Save remainder REM as next least significant digit • Assign quotient Q as new NUM • Repeat step b) until quotient Q is zero • Example: least significant digit NUM R Q REM most significant digit

Unsigned Integers – Convert Decimal to Binary • Example with Unsigned Binary: least significant digit NUM R Q REM most significant digit

Unsigned Integers – Convert Decimal to Hexadecimal • Example with Unsigned Hexadecimal: least significant digit NUM R Q REM most significant digit

Unsigned Integers – Ranges • Range of Unsigned binary numbers based on number of bits • Given representation with n bits, min value is always sequence • 0....0000 = 0 • Given representation with n bits, max value is always sequence • 1....1111 = 2n – 1 • So, ranges are: • unsigned char: • unsigned short: • unsigned int: 2n-1 2n-2 23 22 21 20

Signed Integers – Binary • Signed Binary Integers converts half of range as negative • Signed representation identical, except for most significant bit • For signed binary, most significant bit indicates sign • 0 for nonnegative • 1 for negative • Must know number of bits for signed representation Unsigned Integer representation: 27 25 26 24 23 22 21 20 Place value of most significant bit is negative for signed binary Signed Integer representation: -27 25 26 24 23 22 21 20

Signed Integers – Binary • Decimal value of n-bit signed binary number: • Positive (in-range) numbers have same representation: Unsigned Integer representation: 27 25 26 24 23 22 21 20 Signed Integer representation: -27 25 26 24 23 22 21 20

Signed Integers – Binary • Only when most significant bit set does value change • Difference between unsigned and signed integer values is 2N Unsigned Integer representation: 27 25 26 24 23 22 21 20 Signed Integer representation: -27 25 26 24 23 22 21 20

Signed Integers – Ranges • Range of Signed binary numbers: • Given representation with n bits, min value is always sequence • 100....0000 = – 2n-1 • Given representation with n bits, max value is always sequence • 011....1111 = 2n-1 – 1 • So, ranges are:

Signed Integers – Convert to/from Decimal • Convert Signed Binary Integer to Decimal • Easy – just use place value notation • two examples given on last two slides • Convert Decimal to Signed Binary Integer • MUST know number of bits in signed representation • Algorithm: • Convert magnitude (abs val) of decimal number to unsigned binary • Decimal number originally negative? • If positive, conversion is done • If negative, perform negation on answer from part a) • zero extend answer from a) to N bits (size of signed repr) • negate: flip bits and add 1

Signed Integers – Convert Decimal to Base-R • Example: • A) least significant bit NUM R Q REM most significant bit

Signed Integers – Convert Decimal to Base-R • Example: • B) -3710 was negative, so perform negation • zero extend 100101 to 8 bits • negation • flip bits: • add 1: Can validate answer using place value notation

Signed Integers – Convert Decimal to Base-R • Example: • A) least significant bit NUM R Q REM most significant bit

Signed Integers – Convert Decimal to Base-R • Example: • B) 6710 was positive, so done Can validate answer using place value notation

Signed Integers – Convert Decimal to Base-R • Be careful of range! • Example: • A) • B) -18310was negative, so perform negation • zero extend 10110111 to 8 bits // already done • negation • flip bits: • add 1: not -18310… WRONG! -18310 is not in valid range for 8-bit signed

Representing Strings char S[6] = "18243"; • Strings in C • Represented by array of characters • Each character encoded in ASCII format • Standard 7-bit encoding of character set • Character “0” has code 0x30 • String should be null-terminated • Final character = 0 • ASCII characters organized such that: • Numeric characters sequentially increase from 0x30 • Digit i has code 0x30+i • Alphabetic characters sequentially increase in order • Uppercase chars ‘A’ to ‘Z’ are 0x41 to 0x5A • Lowercase chars ‘A’ to ‘Z’ are 0x61 to 0x7A • Control characters, like <RET>, <TAB>, <BKSPC>, are 0x00 to 0x1A Intel / Linux ‘1’ ‘8’ ‘2’ ‘4’ ‘3’ null term

Representing Strings UTF-16 on Intel • Limitations of ASCII • 7-bit encoding limits set of characters to 27 = 128 • 8-bit extended ASCII exists, but still only 28= 256 chars • Unable to represent most other languages in ASCII • Answer: Unicode • first 128 characters are ASCII • i.e. 2-byte Unicode for ‘4’: 0x34 -> 0x0034 • i.e. 4-byte Unicode for ‘T’: 0x54 -> 0x00000054 • UTF-8: 1-byte version // commonly used • UTF-16: 2-byte version // commonly used • allows 216= 65,536 unique chars • UTF-32: 4-byte version • allows 232 = ~4 billion unique characters • Unicode used in many more recent languages, like Java and Python ‘1’ ‘8’ ‘2’ ‘4’ ‘3’ null term

Representing Pointers int B = -15213; int *P = &B; Sun IA32 x86-64 Different compilers & machines assign different locations to objects

Data Representation in Memory CSCI 224 / ECE 317: Computer Architecture