230 likes | 370 Views
ECE 510 Brendan Crowley. Paper Review October 31, 2006. “Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures”. Rakesh Kumar, Keith Farkas, Norman P. Jouppi, Partha Ranganathan, Dean M. Tullsen. Presentation Overview. Introduction The Architecture
E N D
ECE 510Brendan Crowley Paper Review October 31, 2006
“Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures” Rakesh Kumar, Keith Farkas, Norman P. Jouppi, Partha Ranganathan, Dean M. Tullsen
Presentation Overview • Introduction • The Architecture • Modeling the Architecture • Results • Critical Analysis / Conclusion
Introduction • Background • Processors continue to have increased speed and transistor count as transistor sizes decrease • This leads to increased power consumption which causes problems • Heat dissipation • Chip failure • Battery life • Designers are always searching for new ways to decrease power consumption
Introduction (2) • Most work on reducing power consumption falls under one of two categories: • Voltage and frequency scaling • “Gating” – the ability to turn on/off portions of the core • Some designs have included the use of multiple identical (homogeneous) cores • Others have included processors with co-processors that run a different instruction set
Introduction (3) • The Main Idea • Different software applications have different resource requirements • This fact leads the authors to believe that core diversity is of greater value than uniformity • Therefore, proposed design is a single-ISA heterogeneous multi-core architecture • Each core runs the same instruction set, but has different abilities and performance characteristics
The Architecture • One method is to take a family of previously designed cores, modify their interfaces, and combine them on one die • Each core executes same instruction set, but contains different resources, and therefore achieves different performance and energy efficiency on the same application
The Architecture (2) • The operating system determines the application’s requirements and decides which core is best to use (which core will be the most energy efficient) • To accommodate a wide variety of applications, the cores should have a wide range of performances
The Architecture (3) • Authors chose a 5-core design, using existing cores with a few changes: • Hypothetical single-threaded version of the EV8 (Alpha 21464), which they call the “EV8-” • MIPS R4700 • EV4 (Alpha 21064) • EV5 (Alpha 21164) • EV6 (Alpha 21264)
The Architecture (4) • Assumptions • Each core has a private L1 data and instruction cache • All cores share an L2 cache, phase-locked-loop circuitry and pins • Implemented in 0.10 micron technology • One application running at a time (one thread running)
The Architecture (5) • Relative core sizes
The Architecture (6) • Different parts of a program may require different resources • To take full advantage of the core diversity it is necessary to switch between cores in the middle of program execution • This is done at operating system timeslice intervals, with user-state already saved to memory • If the OS decides to switch cores, the data is saved to the shared L2 cache, where the next core can retrieve it
The Architecture (7) • The authors assume the unused cores are powered down to avoid static leakage and dynamic switching power • This means time must be spent powering up the cores • Experimental results show that this doesn’t affect performance when core-switching is done at OS timer intervals, even with pessimistic assumptions about power-up time and software overhead
Modeling the Architecture • Data on the EV8 was based on some predictions and reported data • Data on the other cores was from published literature • Assume all of the alpha cores run at 2.1GHz (since they assume 0.10 micron process), and the R4700 runs at 1GHz
Modeling the Architecture (2) • All architectures were modeled as accurately as possible on a highly detailed instruction-level simulator, using the configurations in the table below
Modeling the Architecture (3) • The table below shows the area and peak power statistics of the cores • Areas were found from die photos • Total Die area is approximately 400mm2
Modeling the Architecture (4) • Benchmark execution simulated using SMTSIM • Simulator was modified to simulate a multi-core processor with a shared L2 cache • Assume a single thread running on one core at a time • Switching cores requires the active core’s pipeline to be flushed and writing back the L1 cache lines to the L2 cache
Results • The following figure shows results for the SPEC application applu • The Y-axis, IPS2/W, is basically the inverse of power-delay product • Constraint: • Never choose a core that sacrifices more than 50% performance relative to EV8- over an interval
Results (3) • Compared to a single-core architecture, this design could ideally reduce the PDP by 74% • Combination of 25% performance loss and 81% energy savings • Could change the constraint to achieve greater PDP savings (sacrificing performance, of course) • Another design point gives 36% energy savings with 4% performance loss
Results (4) • Could optimize other metrics besides PDP, depending on the design goals • Different power and performance tradeoffs can be made simply by changing the core switching algorithm (no need to change the hardware)
Critical Analysis / Conclusion • There are a lot of assumptions made about things like frequency scaling, power consumption of cores, etc. • This paper only reports results for one benchmark application • Multiple cores/threads running at the same time would likely be used in practice • How would this affect the core switching complexity and latency
Critical Analysis / Conclusion (2) • This technique seems like a very good one • Homogeneous multi-core chips are already on the market • Potential for significant energy savings