460 likes | 586 Views
Computer architecture. Lecture 4: Processor instruction list Piotr Bilski. Execution of program. Processor executes machine instructions (after understanding them - decoding) Programmer creates a program in the symbolic low or high level language
E N D
Computer architecture Lecture 4: Processor instruction list Piotr Bilski
Execution of program • Processor executes machine instructions (after understanding them - decoding) • Programmer creates a program in the symbolic low or high level language • During compilation symbolic language is translated into the machine language instructions
Elements of the machine instructions • Operation code • Argument references (operation input data) • Result reference (if needed) • Reference to the next instruction 0 3 4 15 Operation code Argument references
Arguments and results are stored in: • Memory (main, cache, virtual) • Processor registers (accumulator, general purpose registers) • Input/output devices (hard drive, printer)
Instructions types • Data processing (logical and arithmetic operations) • Data storage (instructions related to the memory access) • Data transmission (input/output operations) • Control (result testing, non-sequential code execution – jumps, branches)
Relation between the symbolic and machine instructions x = x + c; LOAD 1001 ADD 1002 STORE 1001 x c 1001 1002 ALU
Number of the addresses in the instruction 3 addresses 2 addresses Y=(A-B)/(C+D*E) 1 address
Number of the addresses in the instruction (cont.) • Three addresses: ADD a,b,c • Two addresses: MOVE a,b ADD a,c • One address: LOAD b ADD c STOR a a = b + c
Instruction list design problems • How many (and which) operations for processor to execute? • What data types (arguments, results)? • What instruction format (length, addresses’ number)? • How many (and which) registers? • Which addressing modes?
Operands • Addresses (unsigned integers) • Numbers (numerical data) – fixed and floating point precision, decimal • Characters (ASCII / IRA, EBCDIC codes etc.) • Logical data (single bits)
Computer as the data storage • Writing multiple-byte data in memory can be little endian, big endian, and bi-endian • The difference between the models of the data storage is in the sequence of the bytes stored in memory, for example hexadecimal number 76859432 can be written in two ways: 263 264 265 266 263 264 265 266 76 85 94 32 32 94 85 76 Big endian Little endian
Little and big endian • Little endian • Easy to convert longer number to the shorter one • Arithmetic operations are easier to execute • Used in: Intel 80x86, Pentium, Alpha Big endian • Easy to sort character sequences (strings) • Allows printing ASCII characters withot any conversions • Integers and characters are in the same order • Used in: Sun SPARC, RISC processors, Motorola 680x0 • Bi-endian • Understands both standards • Used in: PowerPC
Examples of little and big endian in the file types Big endian: • Adobe Photoshop • IMG (GEM Raster) • JPEG • MacPaint • SGI (Silicon Graphics) • Sun Raster • Bi-endian: • Microsoft RIFF (.WAV & .AVI) • TIFF • XWD (X Window Dump) • Little endian: • BMP (Windows, OS/2 Bitmaps) • GIF • PCX (PC Paintbrush) • TGA (Targa) • Microsoft RTF (Rich Text Format)
Pentium data types • Data are organized in the multiplicity of the byte (byte – B, word – 2 B, double word – 4 B etc.) • Formats are compliant with IEEE 754 norm • No need to store data under the evenly alligned addresses • Unsigned integers (8, 16, 32, 64 bits) - addresses • Signed integers (8,16, 32, 64 bits), two’s complement representation • Floating point numbers (single, double, and extended double precision)
Pentium data types (cont.) • Generic (any content 16,32 or 64 bits long) • Unpacked decimal number binary representation (one digit in a byte) • Packed decimal number binary representation (two digits in a byte) • Pointer (32-bit address) • Bit field • Byte chain
PowerPC data types • Data 8, 16, 32, 64 bits long • Data address alignment to the even byte is not required (though sometimes used) • PowerPC is bi-endian type • Stored: usigned and signed numbers (byte (8b), half-word (16b), word (32b), double word (64b)), floating point numbers (IEEE 754), byte chain (up to 128 B)
Operation classification • Data transfer ( STORE, LOAD, SET PUSH, POP) • Arithmetic (ADD, SUB, NEG, INC, MULT) • Logical (AND, OR, NOT, TEST, SHIFT, ROTATE) • Control passing (JUMP, HALT, EXEC) • Input/output (READ, WRITE) • Conversion (TRANS, CONV)
Data transfer • Aim: to move data from one location to another • Requires: determining memory location (virtual address?), checking for cache memory, producing instruction of read/write operation • Exemplary instructions: LOAD, STORE (in short, long, half-word versions etc.)
Logical operations • Operands are treated as the bit chain • The most popular operations: AND, OR, XOR, NOT • Bit chains treated as masks: A1 = 10100101 XOR A2 = 11111111 01011010 A1 = 10100101 AND A2 = 11110000 10100000
Logical operations (cont.) • Logical shifting • Arithmetic shifting 0 0
Changing execution order • Related to the instructions’ execution order • Contain jumps, calling procedures and execution of one operation in a loop • Control passing can be conditional or unconditional
Conditional branches • Multiple-bit code contains storing results of the operations being a condition to the jump execution, for example determined by the sign of the result, overflow and zeroing the result • The second method is the jump condition embedded in the jump instruction • Jump can be used in both directions
Branch example 351 352 353 SUB X, Y 354 BRZ 373 ........ 372 BR 353 373 ........ 395 Rest of the code 396 BRZ – make a jump, if the result is zero BR – make a jump unconditionally Conditional code of the SUB operation determines jump in BRZ operation
Procedures • They are isolated modules in the source code • Their usage allows to increase flexibility of the code • Require two instructions: call and return • The same procedure can be called many times from different locations • Procedures can be nested
Procedure and return location • Procedure can be called from multiple locations in the program • Nesting of calls is possible • Calling the procedure requires storing the return address: • In the register • At the beginning of the called procedure • On the stack (the best option, allows the operation of the nested (recurrent) procedures)
Stack • It is an isolated memory space to store data, organized as the LIFO structure • In many processors there is the register working as the stack pointer (for example, Motorola 68000) • Main stack operations: PUSH, POP
Example of the stack implementation Stack pointer T F F F End of stack PUSH POP
Working with stack • Operation a+b-(c/d) • Operation in the reverse polish notation: ab+cd/- d c c/d b a+b a+b a a+b a+b-c/d
Stack frame • Set of the procedure parameters including return address • Allows to call the nested procedures storing input and output parameters on the stack
Stack frame illustration Stack cont. SP FP SP Previous frame pointer FP Return point Procedure A Procedure A calls B
Stack frame in Pentium processor • Used by the ENTER, CALL commands • ENTER command supports compilers in the nested procedures implementation • LEAVE command restores previous stack status • Frame pointer is stored in the EBP registry, stack pointer in ESP registry • Example of the CALL execution: PUSH EBP MOV EBP, ESP SUB ESP, space_in_memory
MMX instructions • Introduced in 1996 r. to the Pentium processors • In the first version they were 57 SIMD instructions • Used to execute operations on the integer numbers • Purpose – multimedia applications (computer games, graphics and sound processing) • MMX uses four new data types: packed byte, packed word, packed double word, packed quadruple word
MMX instructions examples • Arithmetic: PADD, PMUL, PMADD • Logical: PAND, PNDN, POR, PXOR • Comparison: PCMPEQ, PCMPGT • Conversion: PUNPCKH, PUNPCKL • All instructions have suffixes determining, which type of data is used in the operation: B, W, D, Q
Additional MMX registers Fourth word • Eight 64-bit registers from MM0 to MM7 • Due to the backward compatibility, the MMX registers are accessible by the older software as the floating point registers eight byte First byte Seventh byte ..... 63 56 7 0
MMX arithmetics • Saturation instead of the overflow 1111 0000 0000 0000 +0011 0000 0000 0000 10010 0000 0000 0000 overflow 1111 0000 0000 0000 +0011 0000 0000 0000 10010 0000 0000 0000 1111 1111 1111 1111 saturation
Why should we use MMX? * - compared to the C code using traditional architecture
SSE instructions • Introduced in 1999 (Pentium 3) • New 70 instructions for the floating point operations • Additional 8 128-bit registers, addressed directly: XMM0 – XMM7 (plus control register MXCSR). • Every register stores 4 32-bit floating point numbers
SSE (cont.) • New data type: 4-element vector of floating point single precision numbers • Operations can be packed (PS – for all elements of the vector), or scalar (SS – inly on the first elements) • Example: xmm0 = [X1 X2 X3 X4] xmm1 = [Y1 Y2 Y3 Y4] ADDPS(xmm0,xmm1) = [X1+Y1 X2+Y2 X3+Y3 X4+Y4]
3DNow! Instructions • Introduced in 1997 r. by the AMD corporation • Provide set of 21 new instructions for the floating point number calculations of the SIMD type • Used in the multimedia applications (high resolution graphics, computer games, CAD/CAM) • Extensions exist: Enchanced 3DNow!, 3DNow Professional
SSE2 instructions • Introduced in 2001 (Intel Pentium IV, Athlon 64, Sempron 754, Transmeta Efficeon) • Set of the additional 144 instructions, supported by 16 128-bit registers (XMM0 – XMM15) • Performed operations on 64-bit floating point (coprocessors x87 work with 80-bit numbers) and integer 128-bit numbers
Next Sets of Instructions • SSE3 (Prescott New Instructions) – 13 new instructions, including the complex numbers arithmetics (since 2004, Pentium IV Prescott, Athlon 64 E) • SSSE3 (Supplemental Streaming SIMD Extension 3) – 16 new instructions operating on integers (since 2005 Xeon, Intel Core 2, AMD Phenom) • SSE4 – 54 new instructions in two groups (47 and 7), including integer number instructions modifying EFLAGS register (new!), implemented in Intel Core 2, Celeron Conroe, Penryn
Next Sets of Instructions (c.d.) • SSE5 – planned to be implemented by AMD in 2009. Finally replaced by three groups: XOP, FMA4, CVT16 (AVX compatible). Implemented in Buldozzer procesors in 2011. Instructions have even 4 arguments! Competitor to Intel’s SSE4 • AVX (Advanced Vector Extensions) – implemented by Intel in 2011: 16 new 256-bit registers (YMM0-YMM15) + 19 instructions working exclusively on these registers
Assembler • Low level programming language • Uses both instructions and symbolic pointers to data • Every processor has its own assembler
Example of the assembly program MACHINE LANGUAGE SYMBOLIC ASSEMBLER PROGRAM • LDA 201 • ADD 202 • ADD 203 • STA 204 • 201 DAT 2 • 202 DAT 3 • 203 DAT 4 • 204 DAT 0 • 0010 0010 0000 0001 • 0001 0010 0000 0010 • 0001 0010 0000 0011 • 0011 0010 0000 0100 • 201 0000 0000 0000 0010 • 202 0000 0000 0000 0011 • 203 0000 0000 0000 0100 • 204 0000 0000 0000 0000 FORMUL LDA I ADD J ADD K STA L I DATA 2 J DATA 3 K DATA 4 L DATA 0 L = I + J + K