650 likes | 809 Views
William Stallings Computer Organization and Architecture 6 th Edition. Chapter 12 CPU Structure and Function. Processor Organization. CPU must: Fetch instructions The CPU reads an instruction from memory. Interpret instructions
E N D
William Stallings Computer Organization and Architecture6th Edition Chapter 12 CPU Structure and Function
Processor Organization • CPU must: • Fetch instructions • The CPU reads an instruction from memory. • Interpret instructions • The instruction is decoded to determine what action is required. • Fetch data • Reading data from memory or an I/O module. • Process data • Performing some arithmetic or logical operation data. • Write data • Writing result to memory or an I/O module.
Register Organization • CPU must have some working space (temporary storage) • Called registers • Number and function vary between processor designs • Top level of memory hierarchy • The registers in the CPU perform two roles: • User-visible register • These enable the machine or assembly language programmer to referenced. • Control and status registers • These are used by the control unit to control the operation of the CPU and by privileged, OS programs to control the execution programs.
User Visible Registers • A user-visible register is one that may be referenced by the machine language. • Three types of user-visible registers: • General Purpose • It can be assigned to a variety of functions by the programmer. • Data • Address • Condition Codes
General Purpose Registers (1) • May be true general purpose • Contain the operand for any opcode • May be restricted • Dedicated registers for floating point and stack operation • May be used for data or addressing • Data register • May be used only to hold data. • Accumulator • Address register • Segment pointers • Index registers • Stack pointer
General Purpose Registers (2) • Make them general purpose • Increase flexibility and programmer options • Increase instruction size & complexity • Make them specialized • Smaller (faster) instructions • Less flexibility
How Many GP Registers? • Between 8 - 32 • Fewer = more memory references • More does not reduce memory references and takes up processor real estate • See also RISC
How big? • Large enough to hold full address • Large enough to hold full word • Often possible to combine two data registers • C programming • double int a; • long int a;
Condition Code Registers • Sets of individual bits • e.g. result of last operation was zero • Can be read (implicitly) by programs • e.g. Jump if zero • Can not (usually) be set by programs
Control & Status Registers • Program Counter • Instruction Decoding Register • Memory Address Register • Memory Buffer Register • Revision: what do these all do?
Program Status Word • Program status word (PSW) • PSW is a set of registers, that contain status information and condition codes, inculding • Sign of last result • Zero:set when the result is 0 • Carry :set if an operation resulted in a carry into or borrow out • Equal :set if a logical compare result is equality • Overflow :used to indicate arithmetic overflow • Interrupt enable/disable :used to enable or disable interrupt. • Supervisor :Indicates whether the CPU is executing in supervisor or use mode.
Supervisor Mode • Intel ring zero • Kernel mode • Allows privileged instructions to execute • Used by operating system • Not available to user programs
Other Registers • May have registers pointing to: • Process control blocks (see O/S) • Interrupt Vectors (see O/S) • N.B. CPU design and operating system design are closely linked
Foreground Reading • Stallings Chapter 12 • Manufacturer web sites & specs
Instruction Cycle • Fetch • Read the next instruction from memory into the CPU • Execute • Interpret the opcode and perform the indicated operation • Interrupt • If interrupts are enables and an interrupt has occurred, save the current process state and service the interrupt.
Indirect Cycle • May require memory access to fetch operands • Indirect addressing requires more memory accesses • Can be thought of as additional instruction subcycle
Data Flow (Instruction Fetch) • Depends on CPU design • In general: • Fetch • PC contains address of next instruction • Address moved to MAR • Address placed on address bus • Control unit requests memory read • Result placed on data bus, copied to MBR, then to IR • Meanwhile PC incremented by 1
Data Flow (Data Fetch) • IR is examined • If indirect addressing, indirect cycle is performed • Right most N bits of MBR transferred to MAR • Control unit requests memory read • Result (address of operand) moved to MBR
Data Flow (Execute) • May take many forms • Depends on instruction being executed • May include • Memory read/write • Input/Output • Register transfers • ALU operations
Data Flow (Interrupt) • Simple • Predictable • Current PC saved to allow resumption after interrupt • Contents of PC copied to MBR • Special memory location (e.g. stack pointer) loaded to MAR • MBR written to memory • PC loaded with address of interrupt handling routine • Next instruction (first of interrupt handler) can be fetched
Prefetch • Fetch accessing main memory • Execution usually does not access main memory • Can fetch next instruction during execution of current instruction • Called instruction prefetch
Improved Performance • But not doubled: • Fetch usually shorter than execution • Prefetch more than one instruction? • Any jump or branch means that prefetched instructions are not the required instructions • Add more stages to improve performance
Pipelining • Decomposition of instruction processing: • Fetch instruction (FI) • Read the next expected instruction into a buffer. • Decode instruction (DI) • Determine the opcode and the operand specifiers. • Calculate operands (CO) • Calculate the effective address of each source operand • Fetch operands (FO) • Fetch each operand from memory. • Execute instructions (EI) • Perform the indicated operation and store the result. • Write operand (WO) • Store the result in memory. • The various stages will be of more nearly equal duration • Overlap these operations
Timing of Pipeline A 6-satge pipe can reduce the execution time for 9 instructions from 54 time units to 14 time units.
Branch in a Pipeline 為條件跳躍指令,將跳到指令15 在此階段之前,無法知到要跳到何處 跳躍發生時,prefetch的指令將被清除。 重新載入指令15
Pipeline Performance • The cycle time can be determined aswheretm=maximum stage delay k= number of stages in the instruction pipeline d= time delay of a latch, needed to advance signals and data from one stage to the next. • The total time required to execute all n instructions is
Pipeline Performance (cont.) • The speedup factor for the instruction pipeline compared to execution without the pipeline
Dealing with Branches • A variety of approaches for dealing with conditional branches: • Multiple Streams • Prefetch Branch Target • Loop buffer • Branch prediction • Delayed branching
Multiple Streams • Have two pipelines • Prefetch each branch into a separate pipeline • Use appropriate pipeline • Drawbacks • Leads to bus & register contention • Multiple branches lead to further pipelines being needed
Prefetch Branch Target • Target of branch is prefetched in addition to instructions following branch • Keep target until branch is executed • Used by IBM 360/91
Loop Buffer • A Loop buffer is a small, very fast memory • Maintained by fetch stage of pipeline • Containing the n most recently fetched instructions • If a branch is occurring, • Checks whether the branch target is within the buffer • Very good for small loops or jumps • Similar in principle to a cache • Used by CRAY-1
Branch Prediction • Used to predict whether a branch will be taken • Predict never taken • Predict always taken • Predict by opcode • Taken/not taken switch • Branch history table • Static approach • Do not depend on the execution history up to the time of the conditional branch instruction • Dynamic approach • Depend on the execution history.
Branch Prediction (1) • Predict never taken • Assume that jump will not happen • Always fetch next instruction • 68020 & VAX 11/780 • Predict always taken • Assume that jump will happen • Always fetch target instruction
Branch Prediction (2) • Predict by Opcode • Some instructions are more likely to result in a jump than others • Can get up to 75% success • Taken/Not taken switch • Recording the history of conditional branch instruction in a program. • One or more bits can be associated with each conditional branch instruction that reflect the recent history of the instruction. • Taken/not taken switch • To associate these bits with any conditional branch instruction that is in a cache. • The use if history bits has one drawback • If the decision is made to take the branch, the target instruction cannot be fetched until the target address is decoded.