1.24k likes | 1.49k Views
CS5365. Pipelining. Pipelining. Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe. Linear Pipeline Structure. All stages execute simultaneously different subtask.
E N D
CS5365 Pipelining
Pipelining • Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe
Linear Pipeline Structure • All stages execute simultaneously different subtask. • A stage is specialized hardware: combinational circuits, A/L operations, processors, etc.
Pipelining • Ideally, all stages take same time to execute their task. Otherwise, the pipe operates at the speed of the slowest subtask.
Note: • Once the pipeline is full it will yield one result every clock period, • A linear pipeline with k stages can process n tasks in: clock periods where k cycles are used to fill up the pipe and complete the first task, and n-1 additional cycles will be needed to complete the n - 1 remaining tasks.
Speedup • Denote Tseq the time required by a non-pipelined uniprocessor to execute n tasks, then: • assuming each of the k operations needs the same length of time to execute.
Speedup • Otherwise
Speedup • The speedup S(k) obtained by a k-pipelined processor is given as follows:
Ideal Pipeline • Speedup S(k) • Note that when n >> k, then
Ideal Pipeline • However, a maximum (ideal) speedup is not possible because overhead due to: • Data dependencies between tasks. • Interrupts. • Program branches.
Efficiency • The efficiency is obtained by dividing the speedup by the number of stages k:
Efficiency • Note also that:
Throughput • It is defined as the number of tasks completed per unit of time:
Throughput • Techniques to increase throughput. • Consider a pipeline with the following configuration where: T1=T3=T and T2=3T Clearly the bottleneck is S2 with a 3T delay.
Throughput • Recall that the throughput ω is inversely proportional to the pipe clock period and: So =3T with a total delay of 3x3T=9T and for large number of tasks (steady state operation).
Throughput • How might the throughput be increased?
Throughput • How might the throughput be increased? • Subdivisions?
Throughput • Subdivisions • What is the total delay now?
Throughput • Subdivisions • What is the total delay now? with a total delay of 5T
Throughput • Subdivisions
Throughput • What are the disadvantages of this solution?
Throughput • What are the disadvantages of this solution? • increased hardware, additional latches.
Throughput • Replication: • Stage 2 is replicated into three stages which are then interleaved
Space-time diagram • replication
Control strategies and configurations • Unifunctional vs. multi-functional pipelines • unifunctional pipelines execute a fixed and dedicated function. • A multi-functional pipeline may perform several functions either at the same time or at different times. Multi-functional functions are possible by interconnecting (reconfiguring) several stages at different times.
Control strategies and configurations • Static vs. Dynamic pipelines • A static pipeline may assume only one functional configuration (unifunctional or multi-functional) at a time. • A dynamic pipeline allows several functional configurations at any time (multi-functional) which require a more complex control mechanisms than those required for static pipelines.
Control strategies and configurations • Scalar vs. vector pipelines • Scalar pipelines processes a sequence of scalar operands under the control of a DO loop. Instructions are prefetched and stored in an instruction buffer. As instructions are executed operands are fetched from a data cache. • vector pipelines (vector processors) handle vector instructions over vector operands under firmware and hardware control.
Levels of processing • Arithmetic Pipelines – ALUs are partitions for pipelined operations. Ex: 4-stage pipes are used in the Star-100, Cray-1 uses 14 pipeline stages, the Cyber 205 uses 26 stages, etc. • Instruction Pipelines – (instruction lookahead) - overlaps the execution of the current instruction with the fetch, decode and operand fetch of subsequent instructions. • Processor Pipeline – it is a cascade of processors. Each executes a different task (a job is divided into different tasks).
Instruction Pipelining • Consider the execution of a single instruction in an uniprocessor system. • A sequence of steps can be identified and implemented using a pipeline design:
Problems • Instruction dependency • Pipeline Stalling • Branching • Conflicts • Interrupts
Instruction dependency. • An instruction I + 1 being fetched may need the results of a previous instruction I currently in the pipe. So I +1 must be delayed until results are known. Stalling • An instruction I +1 must not destroy data that can be needed for a previous instruction I still in the pipe.
Stalling Memory Access Memory Access Memory Access
Stalling Assume data cache access Stall condition? Assume in instruction cache access
Branching • This problem is causes by conditional and unconditional branches