430 likes | 470 Views
PIPELINED PROCESSORS. Chapter No. 5. Pipeline Evolution in Processors. First appeared in at the end of 1960s in the first supercomputers of that time such as IBM 360/91 (1967) and the CDC 7600 (1970). In 1970 the use of pipelining at instruction level in mainframe B7700.
E N D
PIPELINED PROCESSORS Chapter No. 5
Pipeline Evolution in Processors • First appeared in at the end of 1960s in the first supercomputers of that time such as IBM 360/91 (1967) and the CDC 7600 (1970). • In 1970 the use of pipelining at instruction level in mainframe B7700.
Principle of Pipelining • A number of functional units are employed in sequence to perform a single computation. • Each functional unit represent a certain stage of computation. • Pipeline allows overlapped execution of instructions or temporal overlapping of processing. • It increases the overall processor’s throughput. • In pipelined operation each task is divided into a number of subtasks.
Principle of Pipelining • Each stage of pipeline is associated with with each subtask which performs required operation. • For a basic pipeline same amount of time is available in each stage for performing a certain task. • All the pipeline stages operate like assembly line, that is , receiving input typically from previous stage and delivering their output to the next stage. • The basic pipeline operates clocked (synchronously), that is each stage accepts a new input at the start of the clock cycle.
Processor Pipelines in Reality • A real pipeline may include a few extensions to basic pipeline. • Pipelined execution is also often performed using half-cycles. and in certain cases, one or more pipeline stages may have to be recycled to accomplish a given task. • These additional cycles may be required to perform certain arithmetic operations
General Structure of Pipelines • Pipeline consists of a number of stages, one for each subtask. The stages are decoupled from each other by registers, called latches. • As each clock cycle ends, the latches gates in their inputs and forward them into the associated stage where the required operation is performed. • In reality, each stage is often implemented by a number of different FUs/Eus in performing the required operations. • The latches are extended with multiplexers that selects and transfer data from the outputs of preceding Eus to input the subsequent execution units.
Pipeline Performance Measures • Non-pipelined processor • characteristic is instruction cycle time and execution time • Pipelined processor • no importance of execution time • three different measures in pipelined processors: cycle time, latency and repetition rate • Cycle time • specifies the time available for each stage to accomplish the required operations
Pipeline Performance Measures • determined by worst-case processing time of the longest stage • latency • specifies the amount of time that the result of a particular instruction takes to become available in the pipeline for a subsequent dependent instruction • used in context of processing subsequent RAW dependent instruction • Two kinds of latencies define-use dependency and load-use dependency (corresponds to two types of RAW dependencies)
Pipeline Performance Measures • define use latency • mul r1, r2, r3 • add r5, r1, r4 • define-use delay • the time a subsequent RAW-dependent instruction has to be stalled in a pipeline • load-use latency r1, x add r5, r1, r2 • Load-use delay • interpreted same as define-use delay
Pipeline Performance Measures • Repetition rate • also known as throughput • specifies the shortest possible time interval between the subsequent instructions in pipeline the repetition rate of a basic pipeline is one cycle • repetition rate is the performance potential of a pipeline • Performance potential of a pipeline with no define-use delay or load-use delay exist between instructions can be calculated as: P= 1/R*tc
Pipeline Performance Measures where: R:is the repetition rate of the pipeline in cycles tc:is the cycle time of the pipeline
Design space of pipelines Key aspects of the design space of pipelines
Basic Pipeline Layout • The number of pipeline stages • when more pipeline stages are used, more parallel execution and thus a higher performance can be expected • disadvantage: more number of stages results in frequent data and control dependencies which decreases performance • specification of the subtasks to be performed in each stage • the specification of the subtasks at a number of levels of increasing details
Basic Pipeline Layout • Layout of the stage sequence • concerns how the pipeline stages are used • use of bypassing • intended to reduce or eliminate pipeline stalls due to RAW dependencies • Problem:Unless special arrangements are made, the results of the operation instruction is written into the register file, or into the memory, and then it is fetched from there as a source operand • Solution:the result of the EU is immediately forwarded to its input for use in the next pipeline cycle
Basic Pipeline Layout • Its implementation requires an additional data bus for forwarding the results of the execution stage to its input and an appropriate extension of the associated multiplexers and latches • timing of the pipeline operations • self-timed(asynchronous) • clocked (synchronous)
Dependency Resolution Method of dependency resolution Static resolution performed by the compiler Dynamic resolution performed by extra hardware Combined resolution performed partly by the compiler & partly by the hardware Trend
Logical Layout • It specifies the tasks to be accomplished, this includes: • the declaration of pipeline to be implemented • usually separate pipelines for the processing of FX and logical data, called FX pipeline, for FP data, the FP pipeline, for loads and stores, L/S pipeline, and for branches , the B pipeline • DEC a 21164 provides two types of FX integer pipelines • detailed specification of subtasks to be performed and their execution sequence for each pipeline • detailed description of the subtasks to be performed in each stage
Layout of the Physical Pipelines • Multifunction • Only one published design of multifunction pipeline is available and that is MIPS R4200 which implements all the FX, FP, L/S and B instructions • Classical approach/ Master pipeline approach is implemented in IBM 801, MIPS, MIPS-X, MIPS R-series (up to the R6000), i486,& Pentium • Dedicated pipelines • dedicated pipelines are implemented in power PC 603, Power PC 604, DEC a etc
Multiplicity of Pipelines • multiplicity refers to the concept that whether to use a single instance of physical pipeline or multiple instances of physical pipelines. • Two aspects should be considered while considering pipeline multiplicity • frequency of instructions • out-of-order execution of instructions due to multiple pipelines