1 / 99

Lecture 4 Instruction Set Architecture

Lecture 4 Instruction Set Architecture. Instruction Set Architecture. 1950s to 1960s: Computer Architecture Course Computer Arithmetic 1970 to mid 1980s: Computer Architecture Course Instruction Set Design, especially ISA appropriate for compilers

vesna
Download Presentation

Lecture 4 Instruction Set Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4Instruction Set Architecture CS510 Computer Architectures

  2. Instruction Set Architecture • 1950s to 1960s: Computer Architecture Course Computer Arithmetic • 1970 to mid 1980s: Computer Architecture Course Instruction Set Design, especially ISA appropriate for compilers • 1990s: Computer Architecture CourseDesign of CPU, memory system, I/O system, Multiprocessors CS510 Computer Architectures

  3. Languages of Computers • Machine Language • Programs consist of machine instructions • Directly executable without preprocessing • Direct manipulation of machine registers • Efficient in view of machine resource utilization • Difficult to program • Assembly language • Improved version of machine language with emphasis on user-friendliness • Symbolic machine language(symbols for operations and addresses) • Assembler is needed to translate into a machine language program • High-Level Language • Programs consist of statements, each of which can be translated into several machine language instructions • Need a compiler to translate into a machine language program • Relatively easy to program compare to ML or AL • Hardware resource utilization may be inefficient CS510 Computer Architectures

  4. System cost SW HW year Semantic Gap Between ML and HLL • As Hardware cost goes down, Software cost goes up • Shortage of programmers • Unreliable Software => Unreliable Computers • Response: Keep the programming cost down • Develop powerful, complex user-friendly HLL • HLL programmers are easy to train • Greater Semantic Gap between HLL and Machine Language • Execution inefficiency • Software complexity • Compiler complexity • To offset the semantic gap • Large instruction set • Variety of addressing modes • Hardware/Firmware implementation of HLL primitives CS510 Computer Architectures

  5. Instruction Set Boundary between Designers(architects) and programmers • For designers: Specification of the function of CPU • For Programmers: A pool of functions from which they choose to use in the program One would expect that human language should directly reflect the characteristics of human intellectual capabilities that language should be a direct mirror of mind in ways which other systems of knowledge and belief cannot. - Noam Chomsky • Instruction Set • Language of a machine • Characterizes the machine’s capability and behavior • Performance Issues • Memory Bandwidth is used 1/2 for Instructions and 1/2 for Data • For efficient utilization of MB, instruction representation must as compact as possible whilst still being compatible with data • von Neumann Bottleneck exists in MB CS510 Computer Architectures

  6. Memory Bandwidth instruction execution I/O instruction execution I/O Memory bandwidth given to CPU IF OF IF OF IF OF D/IP E D/IP E D/IP E Memory Bandwidth Issue • Memory Bandwidth is used by CPU and I/O • Memory Bandwidth given to CPU is used for Instruction Fetches and Operand Fetches or Operand Stores • Consider an AC-machine; ADD X, or LDA X CS510 Computer Architectures

  7. Machine Language Machine Language • Vocabulary • Operations • Addressing Modes for operands’ addresses and the next instruction address • Syntax • Methods of representing operation(OP-code), operands, addresses in an instruction • Instruction format • Encoding of Instruction fields • Grammar • Rules of using instructions to make a program CS510 Computer Architectures

  8. Components of an Instruction Operation Code(OP-code) • Format specifier • Long / Short • Field definition • Operation • Types of operands Operand Address(es) • Operand itself • Address themselves(including abbreviated) • Address modification specification • Automatic indexing • Relative address • Sequencing CS510 Computer Architectures

  9. Input Bus Input Bus Other Registers Registers AC General Purpose Registers ALU ALU ALU Stack Output Bus Output Bus Stack Architecture AC Architecture GPR Architecture Instruction Set and Computer Architecture Computer Architectures are classified into three classes according to the Register Structures for operands storage • Stack Computer Architecture • AC Computer Architecture • General Purpose Register Computer Architecture CS510 Computer Architectures

  10. n-1 0 SP else SP SP+1, S[SP] M[X], if SP=(n-1), then F 1, E 0 Full(F) ALU Empty(E) else M[X] S[SP], SP SP-1, F 0, if SP=(n-1), E 1 S else ALU S[SP], then S[SP] ALU else ALU S[SP], SP SP-1, then E 1, empty S; else ALU S[SP], then S[SP] ALU Stack Computer Architecture Instruction Operation PUSH X if F=1, then S overflow; POP X if E=1, then empty S; Unary Instr. if E=1, then empty S; (Shift Left) Binary Instr. if E=1, then empty S; (ADD) if SP=(n-1), CS510 Computer Architectures

  11. Characteristics of the Stack Architecture • Instruction length is short • No need to represent the address(es) of operand(s) in functional instructions • Instruction execution time is fast • Operand(s) access is fast because they are in the stack(register) • Operand(s) must be stored in the stack before operating on them • Inconvenient to prepare data in the stack • Frequent use of PUSH and POP instructions to prepare data in the stack - memory access CS510 Computer Architectures

  12. Instruction Operation Input Bus Unary InstructionAC f(AC) Other Registers AC ALU Binary InstructionAC f(AC, M[X]) Output Bus (LDA X) AC M[X] (STA X) M[X] AC AC Computer Architecture (CPA) (ADD X) Transfer Instruction • Characteristics: • - Instruction execution time of binary instructions are slow • One of the operands must be read from memory • - Instruction length is longer than in the stack architecture • One of the operand’s memory address must be specified in the instruction although AC(a data register) can be implied • - Frequency of LDA/STA instructions is high • There is only one data register CS510 Computer Architectures

  13. Instruction Operation Input Bus (COMP R1, R2) R1 f(R2) Registers ALU (ADD R1, R2) or R1 f(R1, R2) or (ADD R1, R2, R3) R3 f(R1, R2) Output Bus (LD R1, X) R1 M[X] (ST R1, X) M[X] R1 GPR Computer Architecture Unary Instruction Binary Instruction Transfer Instruction Characteristics: - Instruction length is short because register addresses are used for operands - Instruction execution time is fast because all the operands are in the registers - Frequency of using LD/ST instructions depends on the number of registers - Opportunities of storing the results of operations in GPR is high because there are many registers CS510 Computer Architectures

  14. Computer Architecture? . . . the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation. Amdahl, Blaaw, and Brooks, 1964 SOFTWARE CS510 Computer Architectures

  15. software instruction set hardware Towards Evaluation of ISA and Organization CS510 Computer Architectures

  16. use time imp 1 Interface Interface use imp 2 use imp 3 Interface Design • A Good Interface: • Lasts through many implementations (portability, compatibility) • Is used in many different ways (generality) • Provides convenient functionality to higher levels • Permits an efficientimplementationat lower levels CS510 Computer Architectures

  17. Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation Instruction set from Implementation High-level Language Based Concept of a Family (IBM /S360 1964) (B5000 1963) General Purpose Register Machines Complex Instruction Sets Load/Store Architecture (Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76) RISC (Mips,Sparc,88000,IBM RS6000, . . .1987) Evolution of Instruction Sets Single Accumulator(EDSAC 1950) CS510 Computer Architectures

  18. Evolution of Instruction Sets • Major advances in computer architecture are typically associated with landmark instruction set designs • Ex: Stack(B1700) vs GPR (System S/360) • Design decisions must take into account: • technology(component) • machine organization • programming languages • compiler technology • operating systems • And they in turn influence these CS510 Computer Architectures

  19. Design Space of ISA Five Primary Dimensions • Number of explicit operands ( 0, 1, 2, 3 ) • Operand Storage Where besides memory? • Effective Address How is memory location specified? • Type and Size of Operands byte, int, float, vector, . . . How is it specified? • Operations add, sub, mul, . . . How is it specified? Other Aspects • Successor How is it specified? • Conditions How are they determined? • Encoding Fixed or variable? Wide? • Parallelism CS510 Computer Architectures

  20. Maximum number of operands to be specified is 3 - 2 source operands and 1 result operand Number of Explicit Operands To optimize the memory bandwidth required by instructions(for fetching from Memory), the number of explicitly specified operands in the instruction needs to be reduced • 2 operands(GPR machine) 2 source operands(1 of the source operands is destroyed after execution to store the result) • 1 operand(AC machine) 1 of the operands is implied to a specific hardware register called Accumulator(AC)(result of the execution is also stored in this register) • 0 operand(Stack machine) Both of the operands and the result are implied to a stack CS510 Computer Architectures

  21. Operand Storage • Storage • Memory • - Long memory addressing • - Need to represent the address with a few bits • Relative addressing with displacement • Page/Segment addressing • Register • - General purpose register • Short register addressing • - AC • Stack(register) • - Does not need for addresses CS510 Computer Architectures

  22. Address Space and Storage Space • Address Space • Consists of addresses that programmers can use • Storage Space • Consists of physical storage locations • For a simple low cost machine, the Address Space and the Storage Space are identical • Programmers program with the actual storage addresses • Modern computers provide the storage systems with Independent Address and Storage Spaces • An Effective Address(EA) needs to be obtained from the Address used in the program to access the operand from the memory • Usually the Address Space is much larger than the Storage Space • Virtual Storage System CS510 Computer Architectures

  23. Mode Algorithm Advantage Disadvantage Effective Address • Address and Physical Storage Location are two different concepts. • Addresses of Operands are represented or implied in the instruction. • Operand’s address needs to be mapped into an Effective Address of the physical storage location Basic Addressing Modes(A or R in instructions) Immediate opd=A # of M refer limited value Direct EA=A simple limited addr space Indirect EA=M[A] large addr space multiple M refer Register EA=R no M refer limited addr space R Indirect EA= M[R] large addr space extra M refer Displacement EA= A+[R] flexibility complexity Stack opd=S[TOP] no M refer limited applications CS510 Computer Architectures

  24. Specification of Type and Size of Operand • Specification of the Type of the operand • Usually different op-codes for different types of operands • Specification of the Size of the operand • op-code represents the resolution of the operand address • bit, byte, half word(upper/lower half) , word, ... • Length of operands • Implicit • Variable length • Specified explicitly in the instruction • Specified by a designated register • Specified by the delimiter marks in the operand reserved-bit delimiter(field or word mark) reserved-bit configuration(record or group mark) CS510 Computer Architectures

  25. CS510 Computer Architectures

  26. Operation • Specification • Encoded to reduce the instruction length reason • Types • Minimal Instruction Set • Complex Instruction Set vs RISC CS510 Computer Architectures

  27. Four Types of Operations Functional ADD, AND, CPA, CPC, ROL, CLA, CLC, INC, … Transfer LDA, STA(LD, ST), … Control JMP, JNA, JZA, JZC(SMA, SZA, SZC), … Input/Output INP, OUT, … CS510 Computer Architectures

  28. Bn instruction • Bn X1,X2,X3 • M[X1] M[X1] - M[X2] • AC M[X1] - M[X2] • If AC < 0 PC X3 M • Move the content of Source to Destination • A 2-address instruction(X1, X2) • A M[PC], PC PC+1 • Temp M[A] • B M[PC], PC PC+1 • M[B], AC SUB(Temp, M[B]) • Memory mapped ALU PC PC PC+1 X1 X2 Temp M[X1] ALU(f) M[X2] AC Minimal Instruction Set CS510 Computer Architectures

  29. ADD X, Y BN a1, a1, a3 /M[a1] 0 BN a1, Y, a3 /M[a1] - M[Y] a3: BN X, a1, a3 /M[X] M[X] + M[Y] JMP X BN a1, a1, a3 /M[a1] 0 BN a1, 1, X /AC -1, PC X Why NOT Use a Minimal Instruction Set? Inefficient Program Size(M bandwidth) - Large IC and CPI Programming difficulty CS510 Computer Architectures

  30. Instruction Set Design:Operations to Include in the Instruction Set Trade-off 3 Es(Elegance, Efficiency, Environment) Elegance • Completeness(Even Bn instruction is complete) • Symmetry: AC <= f(AC, M[X]) and M[X] <= f(AC, M[X]) • Flexibility, Generality Efficiency • Space • Bit budget • Efficient specification of address • Fewer instructions require fewer bits to encode OP-code • Frequency of use arguments • Bandwidth arguments(NOP simply waste memory bandwidth) • Ratio of overheads: non-functional to functional Environment • Multiprogramming(Relocation, Protection, Sharing) • Code generation by compilers(Compiler favors only a little portion of instruction set) CS510 Computer Architectures

  31. ISA Metrics Aesthetics: • Orthogonal • No special registers, few special cases, all operand modes available with any data type or instruction type • Completeness • Support for a wide range of operations and target applications • Regularity • No overloading for the meanings of instruction fields • Streamlined • Resource needs easily determined Ease of compilation (programming?) Ease of implementation Scalability CS510 Computer Architectures

  32. Conventional Instruction Cycle I-F I-P O-F E Instruction Fetch, Decode, Opd addr decision, and fetch Execution (Operation) IF IP OF E O E Powerful Instruction Rich, Powerful Instruction: Instruction with longer Execution Time(E) to balance the overhead penalty(O) Instruction which has a large E/O Overhead for Execution(O) (E) CS510 Computer Architectures

  33. Powerful Instructions • Extended Arithmetic Function • Multiply, divide, Trigonometric Functions, etc • Automatic Indexing • BCT R1, addr (R1 <- R1 - 1, if R1 = 0 then PC <- addr) • BXLE R1, R3, addr (R1 <- R1 + R3, if R3=odd, R1 < R3, PC <- addr if R3=even, R1 < R3+1, PC <- addr) • Subroutine Linkage • JMS X (M[X] <- PC, PC <- X+1) CS510 Computer Architectures

  34. LM R1, R5, addrR1 M[addr] R2 M[addr+1] … R5 M[addr+4] SM R1, R5, addrM[addr] R1 M[addr+1] R2 … M[addr+4] R5 Powerful Instructions Process State Exchange(Context Switch): Instructions required in the multiprogramming environments Otherwise LD R1, addr LD R2, addr+1 … LD R5, addr+4 XJ(Exchange Jump of CDC 6000 series) CS510 Computer Architectures

  35. Stack: 0-address ADD S[TOS] S[TOS] + S[TOS+1] General Purpose Register: 2-address ADD A B S[EA(A)] S[EA(A)] + S[EA(B)] 3-address ADD A B C S[EA(A)] S[EA(B)] + S[EA(C)] Accumulator: 1-address ADD A AC AC + M[EA(A)] (1+x)-address ADDX A AC AC + M[EA(A + [X])] Load/Store: 3-address ADD Ra Rb Rc Ra Rb + Rc LD Ra B Ra M[EA(B)] ST Ra B M[EA(B)] Ra Basic ISA Classes:Type of Internal Storage CS510 Computer Architectures

  36. Example: a*b - (a+c*b) ab*(a(cb)*+)- • push a A • push b A - B A*B • * * • push a + A*B A • push c * A*B A C a b a • push b A A*B B C c b A A*B • * B*C • + A*B B*C+A - (B*C+A)-A*B Stack Machines • Instruction set: Arithmetic operators(+, -, *, /, . . .) push A, pop A CS510 Computer Architectures

  37. The Case Against Stacks • Performance is derived from the existence of several fast registers, not from the way they are organized • Data does not always “surface” when needed • Constants, repeated operands, common sub-expressions so TOP and Swap instructions are required • Code density is about equal to that of GPR instruction sets • Registers have short addresses • Keep things in registers and reuse them • Slightly simpler to write a poor compiler, but not an optimizing compiler CS510 Computer Architectures

  38. CS510 Computer Architectures

  39. GPR Machines • Faster than memory • Easier for a compiler to use • Used to hold variables, intermediate operands • the memory traffic reduces • the code density improves • How many registers? • depends on how they are used by the compiler GPR(General Purpose Register) CS510 Computer Architectures

  40. . . . R1 . . . f(R1) . . . f(R1) . . . R1 . . . Register Life Life No. of Fraction Cum. Length Lives Fraction 1~1 174,927 0.09 0.09 2~3 728,346 0.38 0.48 4~7 547,072 0.29 0.77 8~15 252,508 0.13 0.90 16~31 116,404 0.06 0.96 32~63 41,673 0.02 0.98 64~127 17,790 0.01 0.99 128~ 15,603 0.01 1.00 Register Life Avg number of simultaneous RL: 2 ~ 6 No program uses more than 15 registers simultaneously How Many Registers in RF 6 algorithms from CALGO(ACM) written in 4 languages; ALGOL,BASIC, BLISS,FORTRAN We need to try to keep the live registers in the RF CS510 Computer Architectures

  41. No of memory addresses Type (M,O) Maximum No of operands allowed Examples (0,3) 0 3 SPARC, MIPS, PowerPC, ALPHA (1,2) 1 2 Intel 80x86, Motorola 68000 (2,2) 2 2 VAX (3,3) 3 3 VAX GPR Machines • Maximum number of operands(O) • two or three operands • Number of memory addresses(M) • 0,1,2,3 CS510 Computer Architectures

  42. Example (0,3): ADD R1,R2,R3 R[R1] R[R2] + R[R3] (1,2): ADD R1, X R[R1] R[R1] + M[X] (3,3): ADD X1,X2,X3 M[X1] M[X2] + M[X3] GPR Machines Type Register-register (0,3) Register-memory (1,2) Memory-memory (3,3) Advantages Simple, fixed-length instr. encoding. Simple code generation model Data can be accessed without loading first. Instruction format tends to be easy to encode and yields good density. Program becomes most compact. No waste of registers for temporaries. Disadvantages Higher instruction count. Some instructions are short and bit encoding may be wasteful. A source operand is destroyed. Clocks per instruction varies by operand location. Large variation in instruction sizes and in work per instruction. Memory accesses create memory bottleneck. CS510 Computer Architectures

  43. R-R vs RM A+B+C RR Instructions LD R1,A LD R2,B LD R3,C ADD R4,R1,R2 ADD R5,R4,R3 RM instructions LD R1,A ADD R1,B ADD R1,C RM instructions reduce IC CS510 Computer Architectures

  44. What About Actual Programs Consider a GPR machine with a large register file. - Highly probable that the intermediate data can be found in a register - Thus, LD/ST instruction will be used less frequently - However, frequency of using LD/ST instructions in the computers that use RM instructions will reduced further CS510 Computer Architectures

  45. Byte 0 n m 1 A/M A/M A/M OpCode VAX-11 Variable format, 2- and 3-address instructions • 32-bit word size, 16 GPR (4 reserved) • Rich set of addressing modes (apply to any operand) • Rich set of operations • bit field, stack, call, case, loop, string, poly, system • Rich set of data types (B, W, L, Q, O, F, D, G, H) • Condition codes CS510 Computer Architectures

  46. memory OP Ri Rj v M reg. file R Kinds of Addressing Modes • Register direct [Ri] • Immediate (literal) v • Direct (absolute) M[v] • Register indirect M[[Ri]] • Base+Displacement M[[Ri] + v] • Base+Index M[[Ri] + [Rj]] • Scaled Index M[[Ri] + [Rj]*d + v], eg. d=8 • Autoincrement M[[Ri]+1] • Autodecrement M[[Ri] - 1] • Memory Indirect M[ M[Ri] ] • [Indirection Chains] Addressing Mode value in [ ] is the operand CS510 Computer Architectures

  47. 60 % 55 50 43 40 40 39 32 Tex 30 spice gcc 24 20 17 16 11 10 6 6 3 1 1 0 0 Memory Indirect Scaled Register deferred Immediate Displacement Memory Addressing Modes (VAX) CS510 Computer Architectures

  48. Operand Address bits:Displacement Values • This value is related to the operand address field when the address is represented by the displacement from the base address • Wide distribution • The vast majority --- positive • A majority of the large displacements -negative CS510 Computer Architectures

  49. % 90 87 80 77 78 70 58 60 50 Integer Avg. 45 40 FP Avg. 35 30 20 10 10 10 0 Loads Compares ALU op All instr Operand Address bits:Immediate Addressing Mode Percentage of operations that use immediates CS510 Computer Architectures

  50. % 60 Cumulatively 8 bits : 50% to 70% 16 bits : 75% to 80 % 50 40 gcc 30 TeX spice 20 10 0 0 4 8 12 16 20 24 28 32 Number of bit needed for an immediate value Operand Address bits:Immediate Addressing Mode CS510 Computer Architectures

More Related