150 likes | 407 Views
CSCI206 - Computer Organization & Programming. Introduction to Pipelining. Revised by Alexander Fuchsberger and Xiannong Meng in spring 2019 based on the notes by other instructors. zyBook: 11.5. What is the Critical Path?. I Fetch = 200 ps Reg = 100 ps read / 100 ps write ALU = 200 ps
E N D
CSCI206 - Computer Organization & Programming Introduction to Pipelining Revised by Alexander Fuchsberger and Xiannong Meng in spring 2019 based on the notes by other instructors. zyBook: 11.5
What is the Critical Path? I Fetch = 200 ps Reg = 100 ps read / 100 ps write ALU = 200 ps Data access = 200 ps Mux / sign ext / shift left = negligible
What is the Component Utilization? I Fetch = 200 ps Reg = 100 ps read / 100 ps write ALU = 200 ps Data access = 200 ps Mux / sign ext / shift left = negligible
Utilization • No component is used more than 25% of the cycle! • This means components are usually idle! I Fetch = 200 ps Reg = 100 ps read / 100 ps write ALU = 200 ps Data access = 200 ps Mux / sign ext / shift left = negligible
Pipelining Increases Utilization • Break the process into independent steps and overlap execution • Classic analogy is doing laundry • Total cycle time is 2 hours • Throughput is 0.5 loads per hour
Increasing Throughput • Suppose you have 4 loads of laundry • And only one washer and one dryer! • Using the single-cycle approach:
Increasing Throughput • Suppose you have 4 loads of laundry • And only one washer/dryer! • Using the single-cycle approach: No one does laundry like this!
Pipelining Laundry We don’t have to complete the entire first cycle before starting the next! New time = 3.5 hours speedup = 8 / 3.5 = 2.3
Pipeline Utilization • Utilization • Washer was 25%, now 2/3.5 = 57% • What if we had 10 loads? • What about
Ideal Pipeline Speedup • In reality the CPU is not infinitely pipelineable • Peak number of stages ~50 with intel P4 • Modern Intel Core have 20-30 stages • Simpler ARM CPUs have 4-12 stages • Even cheap (<$1) modern microcontrollers use 2-4 stage pipelines
Terms • Latency • Time to complete one instruction • unchanged, or perhaps increased by pipelining • Throughput • Number of instructions completed per unit of time • ideally N times the unpipelined machine
Register Access • Registers are (possibly) accessed twice per instruction • First access is to read operands • Second access writes the result • To accommodate this the register stage is split in half. • In the first half-cycle registers are written • In the last half-cycle registers are read • This ensures writes happen before reads, so the newest data is always returned