Computer architecture

Computer architecture Lecture 4: Processor instruction list Piotr Bilski

Execution of program • Processor executes machine instructions (after understanding them - decoding) • Programmer creates a program in the symbolic low or high level language • During compilation symbolic language is translated into the machine language instructions

Elements of the machine instructions • Operation code • Argument references (operation input data) • Result reference (if needed) • Reference to the next instruction 0 3 4 15 Operation code Argument references

Arguments and results are stored in: • Memory (main, cache, virtual) • Processor registers (accumulator, general purpose registers) • Input/output devices (hard drive, printer)

Instructions types • Data processing (logical and arithmetic operations) • Data storage (instructions related to the memory access) • Data transmission (input/output operations) • Control (result testing, non-sequential code execution – jumps, branches)

Relation between the symbolic and machine instructions x = x + c; LOAD 1001 ADD 1002 STORE 1001 x c 1001 1002 ALU

Number of the addresses in the instruction 3 addresses 2 addresses Y=(A-B)/(C+D*E) 1 address

Number of the addresses in the instruction (cont.) • Three addresses: ADD a,b,c • Two addresses: MOVE a,b ADD a,c • One address: LOAD b ADD c STOR a a = b + c

Instruction list design problems • How many (and which) operations for processor to execute? • What data types (arguments, results)? • What instruction format (length, addresses’ number)? • How many (and which) registers? • Which addressing modes?

Operands • Addresses (unsigned integers) • Numbers (numerical data) – fixed and floating point precision, decimal • Characters (ASCII / IRA, EBCDIC codes etc.) • Logical data (single bits)

Computer as the data storage • Writing multiple-byte data in memory can be little endian, big endian, and bi-endian • The difference between the models of the data storage is in the sequence of the bytes stored in memory, for example hexadecimal number 76859432 can be written in two ways: 263 264 265 266 263 264 265 266 76 85 94 32 32 94 85 76 Big endian Little endian

Little and big endian • Little endian • Easy to convert longer number to the shorter one • Arithmetic operations are easier to execute • Used in: Intel 80x86, Pentium, Alpha Big endian • Easy to sort character sequences (strings) • Allows printing ASCII characters withot any conversions • Integers and characters are in the same order • Used in: Sun SPARC, RISC processors, Motorola 680x0 • Bi-endian • Understands both standards • Used in: PowerPC

Examples of little and big endian in the file types Big endian: • Adobe Photoshop • IMG (GEM Raster) • JPEG • MacPaint • SGI (Silicon Graphics) • Sun Raster • Bi-endian: • Microsoft RIFF (.WAV & .AVI) • TIFF • XWD (X Window Dump) • Little endian: • BMP (Windows, OS/2 Bitmaps) • GIF • PCX (PC Paintbrush) • TGA (Targa) • Microsoft RTF (Rich Text Format)

Pentium data types • Data are organized in the multiplicity of the byte (byte – B, word – 2 B, double word – 4 B etc.) • Formats are compliant with IEEE 754 norm • No need to store data under the evenly alligned addresses • Unsigned integers (8, 16, 32, 64 bits) - addresses • Signed integers (8,16, 32, 64 bits), two’s complement representation • Floating point numbers (single, double, and extended double precision)

Pentium data types (cont.) • Generic (any content 16,32 or 64 bits long) • Unpacked decimal number binary representation (one digit in a byte) • Packed decimal number binary representation (two digits in a byte) • Pointer (32-bit address) • Bit field • Byte chain

PowerPC data types • Data 8, 16, 32, 64 bits long • Data address alignment to the even byte is not required (though sometimes used) • PowerPC is bi-endian type • Stored: usigned and signed numbers (byte (8b), half-word (16b), word (32b), double word (64b)), floating point numbers (IEEE 754), byte chain (up to 128 B)

Operation classification • Data transfer ( STORE, LOAD, SET PUSH, POP) • Arithmetic (ADD, SUB, NEG, INC, MULT) • Logical (AND, OR, NOT, TEST, SHIFT, ROTATE) • Control passing (JUMP, HALT, EXEC) • Input/output (READ, WRITE) • Conversion (TRANS, CONV)

Data transfer • Aim: to move data from one location to another • Requires: determining memory location (virtual address?), checking for cache memory, producing instruction of read/write operation • Exemplary instructions: LOAD, STORE (in short, long, half-word versions etc.)

Logical operations • Operands are treated as the bit chain • The most popular operations: AND, OR, XOR, NOT • Bit chains treated as masks: A1 = 10100101 XOR A2 = 11111111 01011010 A1 = 10100101 AND A2 = 11110000 10100000

Logical operations (cont.) • Logical shifting • Arithmetic shifting 0 0

Changing execution order • Related to the instructions’ execution order • Contain jumps, calling procedures and execution of one operation in a loop • Control passing can be conditional or unconditional

Conditional branches • Multiple-bit code contains storing results of the operations being a condition to the jump execution, for example determined by the sign of the result, overflow and zeroing the result • The second method is the jump condition embedded in the jump instruction • Jump can be used in both directions

Branch example 351 352 353 SUB X, Y 354 BRZ 373 ........ 372 BR 353 373 ........ 395 Rest of the code 396 BRZ – make a jump, if the result is zero BR – make a jump unconditionally Conditional code of the SUB operation determines jump in BRZ operation

Procedures • They are isolated modules in the source code • Their usage allows to increase flexibility of the code • Require two instructions: call and return • The same procedure can be called many times from different locations • Procedures can be nested

Procedure and return location • Procedure can be called from multiple locations in the program • Nesting of calls is possible • Calling the procedure requires storing the return address: • In the register • At the beginning of the called procedure • On the stack (the best option, allows the operation of the nested (recurrent) procedures)

Procedure call

Stack • It is an isolated memory space to store data, organized as the LIFO structure • In many processors there is the register working as the stack pointer (for example, Motorola 68000) • Main stack operations: PUSH, POP

Example of the stack implementation Stack pointer T F F F End of stack PUSH POP

Working with stack • Operation a+b-(c/d) • Operation in the reverse polish notation: ab+cd/- d c c/d b a+b a+b a a+b a+b-c/d

Stack frame • Set of the procedure parameters including return address • Allows to call the nested procedures storing input and output parameters on the stack

Stack frame illustration Stack cont. SP FP SP Previous frame pointer FP Return point Procedure A Procedure A calls B

Stack frame in Pentium processor • Used by the ENTER, CALL commands • ENTER command supports compilers in the nested procedures implementation • LEAVE command restores previous stack status • Frame pointer is stored in the EBP registry, stack pointer in ESP registry • Example of the CALL execution: PUSH EBP MOV EBP, ESP SUB ESP, space_in_memory

MMX instructions • Introduced in 1996 r. to the Pentium processors • In the first version they were 57 SIMD instructions • Used to execute operations on the integer numbers • Purpose – multimedia applications (computer games, graphics and sound processing) • MMX uses four new data types: packed byte, packed word, packed double word, packed quadruple word

MMX instructions examples • Arithmetic: PADD, PMUL, PMADD • Logical: PAND, PNDN, POR, PXOR • Comparison: PCMPEQ, PCMPGT • Conversion: PUNPCKH, PUNPCKL • All instructions have suffixes determining, which type of data is used in the operation: B, W, D, Q

Additional MMX registers Fourth word • Eight 64-bit registers from MM0 to MM7 • Due to the backward compatibility, the MMX registers are accessible by the older software as the floating point registers eight byte First byte Seventh byte ..... 63 56 7 0

Exemplary MMX operation

MMX arithmetics • Saturation instead of the overflow 1111 0000 0000 0000 +0011 0000 0000 0000 10010 0000 0000 0000 overflow 1111 0000 0000 0000 +0011 0000 0000 0000 10010 0000 0000 0000 1111 1111 1111 1111 saturation

Why should we use MMX? * - compared to the C code using traditional architecture

SSE instructions • Introduced in 1999 (Pentium 3) • New 70 instructions for the floating point operations • Additional 8 128-bit registers, addressed directly: XMM0 – XMM7 (plus control register MXCSR). • Every register stores 4 32-bit floating point numbers

SSE (cont.) • New data type: 4-element vector of floating point single precision numbers • Operations can be packed (PS – for all elements of the vector), or scalar (SS – inly on the first elements) • Example: xmm0 = [X1 X2 X3 X4] xmm1 = [Y1 Y2 Y3 Y4] ADDPS(xmm0,xmm1) = [X1+Y1 X2+Y2 X3+Y3 X4+Y4]

3DNow! Instructions • Introduced in 1997 r. by the AMD corporation • Provide set of 21 new instructions for the floating point number calculations of the SIMD type • Used in the multimedia applications (high resolution graphics, computer games, CAD/CAM) • Extensions exist: Enchanced 3DNow!, 3DNow Professional

SSE2 instructions • Introduced in 2001 (Intel Pentium IV, Athlon 64, Sempron 754, Transmeta Efficeon) • Set of the additional 144 instructions, supported by 16 128-bit registers (XMM0 – XMM15) • Performed operations on 64-bit floating point (coprocessors x87 work with 80-bit numbers) and integer 128-bit numbers

Next Sets of Instructions • SSE3 (Prescott New Instructions) – 13 new instructions, including the complex numbers arithmetics (since 2004, Pentium IV Prescott, Athlon 64 E) • SSSE3 (Supplemental Streaming SIMD Extension 3) – 16 new instructions operating on integers (since 2005 Xeon, Intel Core 2, AMD Phenom) • SSE4 – 54 new instructions in two groups (47 and 7), including integer number instructions modifying EFLAGS register (new!), implemented in Intel Core 2, Celeron Conroe, Penryn

Next Sets of Instructions (c.d.) • SSE5 – planned to be implemented by AMD in 2009. Finally replaced by three groups: XOP, FMA4, CVT16 (AVX compatible). Implemented in Buldozzer procesors in 2011. Instructions have even 4 arguments! Competitor to Intel’s SSE4 • AVX (Advanced Vector Extensions) – implemented by Intel in 2011: 16 new 256-bit registers (YMM0-YMM15) + 19 instructions working exclusively on these registers

Assembler • Low level programming language • Uses both instructions and symbolic pointers to data • Every processor has its own assembler

Example of the assembly program MACHINE LANGUAGE SYMBOLIC ASSEMBLER PROGRAM • LDA 201 • ADD 202 • ADD 203 • STA 204 • 201 DAT 2 • 202 DAT 3 • 203 DAT 4 • 204 DAT 0 • 0010 0010 0000 0001 • 0001 0010 0000 0010 • 0001 0010 0000 0011 • 0011 0010 0000 0100 • 201 0000 0000 0000 0010 • 202 0000 0000 0000 0011 • 203 0000 0000 0000 0100 • 204 0000 0000 0000 0000 FORMUL LDA I ADD J ADD K STA L I DATA 2 J DATA 3 K DATA 4 L DATA 0 L = I + J + K

Computer architecture

Computer architecture

Presentation Transcript

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture