140 likes | 389 Views
Effect of pipeline depth in CPU efficiency. Carlos David Bula R. Computers Architecture Electrical and Computer Engineering Department University of Puerto Rico, Mayagüez Campus April 2008. Outline. Introduction Background Introductory Real life Example Pipeline depth Vs. Performance
E N D
Effect of pipeline depth in CPU efficiency Carlos David Bula R. Computers Architecture Electrical and Computer Engineering Department Universityof Puerto Rico, Mayagüez Campus April 2008
Outline • Introduction • Background • Introductory Real life Example • Pipeline depth Vs. Performance • Introducing the power consumption factor • Conclusions
Introduction • In recent years the use of Microprocessor based mobile devices have become widespread. • processing and Battery life • More powerful servers and workstations consuming less power are desirable. • Heating is undesirable • CPUs Design- pipeline structure Great impact in Power/Performance
Background 582M Transistors actual version @ 65nm Next version 820M Transistors @45nm • In the past the Microprocessor design was only-performance driven. • Nanometer fab. technologies have taken VLSI designs to higher levels. Today at 45nm • Powerful Superscalar CPUs with hundred of millions of transistors. (Core 2 Quad ) • The pipeline depth have a very important impact in the CPU efficiency. AMD Phenom X4 CPU 462M Transistors AMD Quad Core chip Layout
Background • When designing an energy efficient CPU the pipeline structure must be decided in early stages of the design. • Slicing CPU in more pipeline stages may lead to some performance gains. But we have to be careful Some tradeoffs involved
A real life example: Netburst Architecture a.k.a. Pentium 4 • The P4 CPUs and derivatives were designed to reach high clock speeds. • 20 pipelines initially • 32 Pipeline stages in later revisions • Strategy: • Marketing Driven design • Performance scaling by means of clock speed scaling. • Poor power efficiency specially in later models (Prescott). • Not suited for Mobile devices • Resigned this space to Pentium M and A64 • Poor IPC rate
The Core 2: A well designed CPU • Designed from the scratch to be efficient • It has only 10 pipeline stages • Core 2 is a 4 width issue CPU • Execute much more instructions per cycle than P4 • Consumes much less power.
Pipeline depth Vs. Performance • Adding more pipelines allows using higher clock speeds. • The functional units have less logic depth and therefore introduce less delay. • Tradeoff: Increasing pipelines in excess reduces IPC. • Branch misprediction is catastrophic for long pipeline CPUs Inter-stage latches introduces some delay overhead. S1 S2 S3 S4 S5 Tda More delay per stage Fmax=1/Tdmax S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 Tdmax: delay of stage with the greatest delay Tdb Less delay per stage
Pipeline depth Vs. Performance • Studies by Harstein and Puzak shows the effect of pipeline depth on the performance. • It was found that there is an optimal number of pipeline stages which maximizes performance. No power considerations were made.[Harstein02]
Introducing the power consumption factor • The heat produced by a microprocessor is directly related to its power consumption, which at the same time is proportional to the working clock frequency • Pipelining is employed to achieve higher clock frequencies and lower supply voltages. • Higher working clock frequencies are needed to compensate for the IPC loss • Increased clock frequency could produce even higher power consumption, canceling the lowered voltage savings.
Optimal power/performance Pipeline depth • There is a number of pipeline stages which produces maximum performance. • It was found that the optimum power/ performance depth is 7 stages.[Harstein03] • Considering a pure performance driven design may lead to a selection of overly deep pipelined CPU, operating in an inefficient way.[Zubyan04] • This means that when considering power, the optimal pipeline length reduces.
Conclusions • There is a trend to design and fabricate even more energy efficient processors, triggered by the widespread use of mobile computers and the growing need of energy saving. • Maximizing CPU performance consist in finding a balance between clock frequency and IPC. • Pipelining serves as a method for reducing energy consumption. However, the reduction in power consumption achieved by increasing the number of pipelines may be annulled by power consumption of added pipeline latches and by the increased clock frequency that compensates for the IPC loss.
References • [Harstein02] Harstein, A. and Puzak R., “The Optimum Pipeline Depth for a Microprocessor”, Proceedings of the 29th Annual International Symposium on Computer Architecture (Anchorage, USA, May 2002). • [Harstein03] Hartstein, A. and Puzak, T., “Optimun Power/Performance Pipeline Depth” in Proceedings of the 36th International Symposium on Microarchitecture.(New York, USA, Dec. 2003) • [Heo04] Heo, S., Asanovi´c, K., “Power Optimal Pipelining in Deep Submicron Technology”, in Proceedings of the 2004 International Symposium on Low Power Electronics and Design (NewPort Beach, USA, 2004). • [Lotfi08] Lotfi-Kamran, P., Rahmani, A., Salehpour, A., Afzali-Kusha, A. and Navab Z., “Stall Power Reduction in Pipelined ArchitectureProcessors”, in 21st International Conference on VLSI Design (Hyderabad, India, Jan. 2008). • [Peng07] Peng, L., Peir, J., Prakash, T., Chen, Y. and Koppelman, D., “Memory Performance and Scalability of Intel’s and AMD’s Dual-Core Processors: A Case Study” in Performance, Computing, and Communications 26th IEEE International Conference(New Orleans, USA, April 2007). • [Sangireddy08] Sangireddy, Rama; Shah, Jatan, “Operand-Load-Based Split Pipeline Architecture for High Clock Rate and Commensurable IPC”, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 4, APRIL 2008 • [Sprangle02] Sprangle, E., Carmean, D., “Increasing processor performance by implementing deeper pipelines”, inProceedings of 29th Annual International Symposium on Computer Architecture (Anchorage, USA, May 2002) • [Zyuban04] Zyuban, V., Brooks, D., Srinivasan, V., “Integrated Analysis of Power and Performance for Pipelined Microprocessors”, IEEE Transactions on computers, VOL. 53, NO. 8,2004. • [Anandtech 06] www.anandtech.com