480 likes | 922 Views
Chapter 13 Reduced Instruction Set Computers (RISC). CISC – Complex Instruction Set Computer RISC – Reduced Instruction Set Computer. Some Major Advances in Computers in 50 years. VLSI The family concept Microprogrammed control unit Cache memory MiniComputers Microprocessors Pipelining
E N D
Chapter 13Reduced Instruction Set Computers (RISC) • CISC – Complex Instruction Set Computer • RISC – Reduced Instruction Set Computer
Some Major Advances in Computers in 50 years • VLSI • The family concept • Microprogrammed control unit • Cache memory • MiniComputers • Microprocessors • Pipelining • PC’s • Multiple processors • RISC processors • Hand helds
RISC • Reduced Instruction Set Computer • Key features • Large number of general purpose registers (or use of compiler technology to optimize register use) • Limited and simple instruction set • Emphasis on optimising the instruction pipeline & memory management
Driving force for CISC • Software costs far exceed hardware costs • Increasingly complex high level languages • A “Semantic” gap between HHL & ML • This Leads to: • Large instruction sets • More addressing modes • Hardware implementations of HLL statements • e.g. CASE (switch) on VAX (long, complex structure)
Intention of CISC • Ease compiler writing • Improve execution efficiency • Complex operations in microcode • Support more complex HLLs
Execution Characteristics Studied What was studied? • Operations performed • Operands used • Execution sequencing How was it Studied? • Studies was done based on programs written in HLLs • Dynamic studies measured during the execution of the program
Operations • Assignments • Movement of data • Conditional statements (IF, LOOP) • Sequence control Observations? • Procedure call-return is very time consuming • Some HLL instructions lead to very many machine code operations
Dynamic Occurrence Machine-Instruction Weighted Memory-Reference Weighted Pascal C Pascal C Pascal C ASSIGN 45% 38% 13% 13% 14% 15% LOOP 5% 3% 42% 32% 33% 26% CALL 15% 12% 31% 33% 44% 45% IF 29% 43% 11% 21% 7% 13% GOTO — 3% — — — — OTHER 6% 1% 3% 1% 2% 1% Weighted Relative Dynamic Frequency of HLL Operations [Patterson]
Pascal C Average Integer Constant 16% 23% 20% Scalar Variable 58% 53% 55% Array/Structure 26% 24% 25% Operands Observations? • Predominately local scalar variables Implications? • Optimization should concentrate on accessing local variables
Procedure Calls Observations? • Context switching is quite time consuming • Depends on number of parameters passed • Depends on level of nesting • Most programs do not do a lot of calls followed by lots of returns • Most variables used are local
Implications Characterize RISC • Best support is provided by optimising: • most utilized features and • most time consuming features • Conclusions: • Large number of registers • Used for operand referencing • Careful design of pipelines • Address branching - Branch prediction etc. • Simplified instruction set • Reduced length • Reduced number
Register File • Software solution • Require compiler to allocate registers • Allocate based on most used variables in a given time • Requires sophisticated program analysis • Hardware solution • Have more registers • Thus more variables will be in registers
Using Registers for Local Variables • Store local scalar variables in registers • Reduces memory accesses • Every procedure (function) call changes locality • Parameters must be passed • Partial context switch • Results must be returned • Variables from calling program must be restored • Partial Context switch
Using “Register Windows” Observations: • Typically only few local & Pass parameters • Typically limited range of depth of calls Implications: • Use multiple small sets of registers • Calls switch to a different set of registers • Returns switch back to a previously used set of registers • Partition register set
Using “Register Windows” cont. Partition register set into: • Local registers • Parameter registers (Passed Parameters) • Temporary registers (Passing Parameters) Then: • Temporary registers from one set overlap parameter registers from the next This provides parameter passing without moving data (just move one pointer)
Overlapping Register Windows Picture of Calls & Returns:
Operation of Circular Buffer • When a call is made, a current window pointer is moved to show the currently active register window • If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory • A saved window pointer indicates where the next saved windows should be restored
Global Variables How should we accommodate Global Variables? • Allocate by the compiler to memory • Have a static set of registers for global variables • Put them in cache
Large Register File Cache All local scalars Recently-used local scalars Individual variables Blocks of memory Compiler-assigned global variables Recently-used global variables Save/Restore based on procedure nesting depth Save/Restore based on cache replacement algorithm Register addressing Memory addressing Registers v Cache – which is better?
Compiler Based Register Optimization Basis: • Assuming relatively small number of registers (16-32) • Optimizing the use is up to compiler • HLL programs have no explicit references to registers Process: • Assign symbolic or virtual register to each candidate variable • Map (unlimited) symbolic registers to (limited) real registers • Symbolic registers that do not overlap can share real registers • If you run out of real registers some variables use memory
Graph Coloring Algorithm for Reg Assign Given: • A graph of nodes and edges • Nodes are symbolic registers • Two symbolic registers that are live in the same program fragment are joined by an edge Then: • Assign a color to each node • Adjacent nodes must have different colors • Assign minimum number of colors And then: • Try to color the graph with n colors, where n is the number of real registers • Nodes that can not be colored are placed in memory
The debate: Why CISC (1 of 2)? • Compiler simplification? • Dispute… - Complex machine instructions are harder to exploit - Optimization actually may be more difficult • Smaller programs? (Memory is now cheap) • Programs may take up less instructions, but… • May not occupy less memory, just look shorter in symbolic form • More instructions require longer op-codes, more memory references • Register references require fewer bits
The Debate: Why CISC (2 of 2)? • Faster programs? • More complex control unit • Microprogram control store larger • Thus instructions take longer to execute • Bias towards use of simpler instructions ? • It is far from clear that CISC is the appropriate solution
Early RISC Computers • MIPS – Microprocessor without Interlocked Pipeline Stages Stanford (John Hennessy) MIPS Technology • SPARC – Scalable Processor Architecture Berkeley (David Patterson) Sun Microsystems • 801 – IBM Research (George Radin)
Concentrating on RISC: Major Characteristics: • One instruction per cycle • Register to register operations • Few, simple addressing modes • Few, simple instruction formats Also: • Hardwired design (no microcode) • Fixed instruction format But: • More compile time/effort
Memory to memory vs Register to memory Operations Lab Project 1
Controversy: CISC vs RISC • Challenges of comparison • There are no pair of RISC and CISC that are directly comparable • There are no definitive set of test programs • It is difficult to separate hardware effects from complier effects • Most comparisons are done on “toy” rather than production machines • Most commercial machines are a mixture
Controversy: RISC v CISC • Not clear cut • Today’s designs borrow from both philosophies
RISC Pipelining basics • Two phases of execution for register based instructions • I: Instruction fetch • E: Execute • ALU operation with register input and output • For load and store there need to be three • I: Instruction fetch • E: Execute • Calculate memory address • D: Memory • Register to memory or memory to register operation
Effects of RISC Pipelining (Allows 2 memory accesses per stage) (E1 register read, E2 execute & register write Particularly beneficial if E phase can be longer)
Optimization of RISC Pipelining • Delayed branch • Leverages branch that does not take effect until after execution of following instruction • This, following instruction becomes the delay slot
Example of Delayed Branch (cleaver!) What is wrong with this example? Why is there a Write back
More Options for RISC Architectures • RISC Pipelining • Superpipelined – more fine grained pipeline (more stages in pipeline) • Superscaler – replicates stages of pipeline (multiple pipelines)
MIPS 4000 RISC Machine • 64 bit architecture (4 Gig address space) (1 Terabyte of file mapping) • Partitioned into CPU & MMU • 32 registers (R0=0), but • 128K Cache – ½ Instructions, ½ Data • One 32 bit word for each instruction (94 Instructions) • All operations are register to register • No condition codes! Flags are stored in general registers for explicit use – simplifies branch optimization • Only on load/Store Format – Base, Offset extended addressing synthesized with multiple instructions • Uses branch prediction • Especially designed for Embedded computing • Has multiple FPU’s – FP likely stalls pipeline
MIPS 4000 RISC Machine • See MIPS instructions - page 485 • See Formats – page 486