270 likes | 452 Views
Lecture 5: Instruction Set Architecture. Computer Engineering 585 Fall 2001. Summary, #1. Designing to Last through Trends Capacity Speed Logic 2x in 3 years 2x in 3/2 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years
E N D
Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001
Summary, #1 • Designing to Last through Trends • Capacity Speed • Logic 2x in 3 years 2x in 3/2 years • DRAM 4x in 3 years 2x in 10 years • Disk 4x in 3 years 2x in 10 years • 6yrs to graduate => 16X CPU speed, DRAM/Disk size • Time to run the task • Execution time, response time, latency • Tasks per day, hour, week, sec, ns, … • Throughput, bandwidth • “X is n times faster than Y” means • ExTime(Y) Performance(X) • --------- = -------------- • ExTime(X) Performance(Y)
1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Summary, #2 • Amdahl’s Law: • CPI Law: • Execution time is the REAL measure of computer performance! • Good products created when have: • Good benchmarks, good ways to summarize performance • Die Cost goes roughly with die area4 • Can PC industry support engineering/research investment?
Computer Architecture Is … the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. Amdahl, Blaaw, and Brooks, 1964 SOFTWARE
Computer Architecture’s Changing Definition • 1950s to 1960s: Computer Architecture Course: Computer Arithmetic • 1970s to mid 1980s: Computer Architecture Course: Instruction Set Design, especially ISA appropriate for compilers • 1990s-2000s: Computer Architecture Course:Design of CPU, memory system, I/O system, Multiprocessors
Instruction Set Architecture (ISA) software instruction set hardware
Interface Design • A good interface: • Lasts through many implementations (portability, compatibility) • Is used in many different ways (generality) • Provides convenient functionality to higher levels • Permits an efficient implementation at lower levels use time imp 1 Interface use imp 2 use imp 3
Evolution of Instruction Sets Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language Based Concept of a Family (B5000 1963) (IBM 360 1964) General Purpose Register Machines Complex Instruction Sets Load/Store Architecture (CDC 6600, Cray 1 1963-76) (Vax, Intel 432 1977-80) RISC (Mips,Sparc,HP-PA,IBM RS6000, . . .1987)
A "Typical" RISC • 32-bit fixed format instruction (3 formats) • 32 32-bit GPR (R0 contains zero, DP take pair) • 3-address, reg-reg arithmetic instruction • Single address mode for load/store: base + displacement • no indirection • Simple branch conditions • Delayed branch see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
Evolution of Instruction Sets • Major advances in computer architecture are typically associated with landmark instruction set designs • Ex: Stack vs GPR (System 360) • Design decisions must take into account: • technology • machine organization • programming languages • compiler technology • operating systems • And they in turn influence these
Example: MIPS Register-Register 6 5 11 10 31 26 25 21 20 16 15 0 Op Rs1 Rs2 Rd Opx Register-Immediate 31 26 25 21 20 16 15 0 immediate Op Rs1 Rd Branch 31 26 25 21 20 16 15 0 immediate Op Rs1 Rs2/Opx Jump / Call 31 26 25 0 target Op
Architecture, Implementation • Architecture deals with functions provided to the programmer: addressing, addition, interrupt, and I/O • Implementation deals with method used to achieve this function, such as a parallel datapath and a microprogrammed control • Realization is means used to materialize this method: electrical, magnetic or mechanical devices; power and packaging.
Clock Architecture 12 11 1 2 10 3 9 8 4 7 6 5 Architecture Variant Realizations
Architecture: Two arms – small one for hour, longer one for minutes, may be alarm. • Realization: Shape of clock arms and dial, numbers. Mechanical or digital mechanism. Energy source a wound spring or a battery.
Instruction Set Design: (1) Ease of Use • consistency: with a partial knowledge of the system, one can predict the remainder. e.g. including square-root as an instruction should almost fully define everything else. FP op halve was added to IBM 360 as an afterthought and lacked post-normalization. • orthogonality: Two independent concerns should be handled as such. e.g. clock architecture -- (1) luminous dial (2) alarm. IBM 650, low order addr bits determine amount of shift. Yet, if address exceeds address space, a violation occurs.
transparency: an architectural function is transparent if its implementation does not produce any architecturally visible side-effects. e.g. pipelining should not affect the compiler-visible machine. • generality: Designer should not limit a function by his/her own notions about its use. Intel 8080 has a restart op intended to restart after an interrupt. Its larger use is a return from a subroutine, since it was designed in all its generality. • open-endedness: provision for future expansion. • completeness: all functions of a given class are provided. special case: symmetry: inverse is also provided.
Instruction Set Design: • (2)Program size: memory size; CPU-MM bandwidth; frequently used (written-down) instructions should be short. • (3) Execution speed: time required to execute an instruction • Can they be pipelined? Are they uniform in execution length? • Control and cache are often in the critical path of a processor design. • Uniform length requirements at loggerheads with (2) above. • (4) Complexity of control unit: Some instructions should not even be in the instruction set. (RISC)
Instruction Set Classification Instruction: Opcode ---- Operands: ADD R1, 20 • internal CPU operand storage mechanism: registers, stack, accumulator • # explicit operands / instruction: 0, 1, 2, 3 • presumed operand locations: memory, stack • Operations • type and size of operands
Stack/Reg/Acc Architectures C = A+B
Stack: short inst, post-fix model of expression evaluation; • sequential operand access --- hard for compilers, • Implementation issues --- how deep, exception handling e.g. when empty? • Accumulator: short inst and relatively small machine state, (easier context-switch); high memory traffic. • Reg-Reg: Easiest for compiler optimization -- most general model. long instructions and large state.
Endian-ness of Memory Addressing Cohen's article: On Holy Wars and a Plea for Peace, IEEE Computer, Oct 81. bits, bytes CPU Memory words, pages What order are they composed in order to form the next object in the hierarchy? LSB (less-significant unit) travels first little endians (Lilliputians) MSB (more-significant unit) travels first big endians (Blefuscians)
Endian-ness • Big-endian: IBM 360, MIPS, Motorola 680xx, SPARC, DLX • Little-endian: DEC VAX,Compaq/HP Alpha, Intel 80x86 • Selectable: PowerPC, MIPS: mode bit: 0-Big, 1-Little Word at Addr 4: 0X10203040 (Big) 0x40302010 (Little)
Memory addressing contd: data alignment Most machines are byte addressable. Object Misaligned at byte addr Aligned at byte addr Byte 0,1,2,3,4,5,6,7 Never Half word 0,2,4,6 1,3,5,7 Word 0,4 1,2,3,5,6,7 Double word 0 1,2,3,4,5,6,7
Physical Rationale for Alignment 10 addr. bits 32 to 1 1K to 1 Decoder 32 32X32B memory decoder .. …1K 1KX1B memory 5 MSB Addr bits 32 B 5 LSB Addr bits 32 to 1 multiplexor 1 B
Costs of misalignment Memory Multiplexor a3=0 a3=1 6 0 7 1 3 4 2 5 a2=1 a2=0 3 addr bits: a3, a2, a1