260 likes | 434 Views
Machine Instruction Rulebook. CPU design must support machine instructions Thus a CPU has an instruction set Intel, for example, has the x86 instruction set As instruction set grows old, designers have choice One option: start over (e.g. IBM's S/360)
E N D
Machine Instruction Rulebook • CPU design must support machine instructions • Thus a CPU has an instruction set • Intel, for example, has the x86 instruction set • As instruction set grows old, designers have choice • One option: start over (e.g. IBM's S/360) • Typical choice: backward compatibility (x86) • Middle option: modularize the instruction set (e.g. extensions for Intel, ARM)
CISC versus RISC • RISC = Reduced Instruction Set Computer • CISC = Complex Instruction Set Computer • RISC circuitry is simpler than that of CISC • Translates to speed, reliability, lower costs • RISC doesn’t have lots of “high-level” instructions at hardware level • RISC puts burden on compilers • Even CISC chips are typically RISC on the inside • Intel is CISC, ARM is RISC
Pipelining • Push multiple instructions thru cycle at same time • Trick: divide instruction processing into stages • For example, fetch first instruction • Then do two things at same time: • Decode first instruction • Fetch next instruction • Then do three things at same time: • Execute first instruction • Decode second instruction • Fetch third instruction
An Example Six-Stage Pipeline • FI: Fetch Instruction • DI: Decode Instruction • CO: Calculate Operands (figure out where operands are) • FO: Fetch Operands • EI: Execute Instruction • WO: Write Operand
Adding Stages, Saving Time • Each clock pulse redefined • was: “go through all stages for one instruction” • now: “go through a stage for many instructions” • The “old” clock tick included all six stages • The “new” clock tick is for all stages at once • How do we quantify? Need delay for longest stage • Modern pipelines can have as many as 30 or more stages • So, how do we save time? Marginal time per instruction • Consider timing for the nine instructions in last slide • Without pipeline: 9 “old” clock ticks • With pipeline?
Creating Stages with Memory • Need some way to separate stages • Need to “pipe” output of one stage to input of next • Regular memory uses flip-flops (edge-triggered) • The pipeline can remember using latches (level-triggered) • So establish latches to remember output of each stage • Latch also serves as input for the next stage
A Third Way: EPIC • Explicitly Parallel Instruction Computing • Co-developed by Intel and HP • Itanium was first implementation of EPIC • Innovative solution to branch prediction problem • don't try to predict at all! • execute all the paths in the code (up to a point) • keep up with register copies for each path • when branch decided, free up “extra” registers
Hazard: Data Conflict RAW: read ahead of write (fetching operand before change)
Superscalar Architecture • Use multiple function units • multiple instructions can execute in parallel • each uses its own circuitry (e.g. multiple ALUs) • Issues • some instructions shouldn’t execute in parallel • difficult to design CPU that decides • put burden on compiler (e.g. the Pentium optimized compiler versus generic compiler)
Multiple Function Units Execute Fetch Decode Execute Store Execute
Doing Things Out of Order • Dispatcher can grab any instruction • while waiting on fetch, do something useful • Look out for data hazards • RAW: read ahead of write • WAR: write ahead of read • WAW: write ahead of write • Dispatcher must be aware of dependencies • Renaming registers: pros and cons
Improving Performance • Rev up the clock speed • Redesign circuits to reduce delay, e.g. • Use ripple adder: decrease worst-case ALU propagation • Reorder some microinstructions • Add an incrementer to PC register • Add pre-fetcher, pipeline, superscalar • Add lots of registers (RISC) • Branch prediction, speculative execution • Note tradeoff among speed, cost, and space • Find things to do in parallel