240 likes | 426 Views
Morphable Computer Architectures for Highly Energy Aware Systems: PACC Kickoff: May 23, 24, 2000; Scottsdale, AZ. Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd.edu Kanad Ghose: CS Dept. SUNY-Binghamton; ghose@cs.binghamton.edu Nikzad “Benny” Toomarian:
E N D
Morphable Computer Architecturesfor Highly Energy Aware Systems:PACC Kickoff: May 23, 24, 2000; Scottsdale, AZ Peter M. Kogge: CSE Dept. University of Notre Dame kogge@cse.nd.edu Kanad Ghose: CS Dept. SUNY-Binghamton; ghose@cs.binghamton.edu Nikzad “Benny” Toomarian: Center for Integrated Space Microsystems (CISM) Jet Propulsion Lab; benny@cism.jpl.nasa.gov
MORPH: Dynamic Low Energy Architectures • New Ideas • Multi-cluster microarchitecture to allow dynamic changes in energy expended per cycle • Energy efficient ISA extensions to process data more energy efficiently • Energy efficient morphable memory hierarchies • Adaptive algorithms to select best configuration • Energy aware run-time which can reconfigure system MORPH Adds An “Energy Gear” to Embedded Systems • IMPACT • Changes focus to energy, not power, management • Adds extra degrees of freedom to dynamic energy control • Provides an inherently more energy efficient architecture • Designed with real embedded missions in mind 0 6 mo 1 yr 18 mo 2 yr Profiles Baseline Morphable Node Data Placement Adaptive Algorithms Run-time Demo & Eval
Why is PACC Important? • Real world: limited energy sources • Renewable energy: 12-15 watts at high noon • Fixed capacity batteries for off-peak sunlight or emergencies in shade • Multiple operational modes, all compute/energy constrained • Movement: collision avoidance • Spectroscopy: data gathering vs analysis • Communication: compression vs transmission • Today: • Select computers for peak performance needs • Limited ability to “downshift”
The Future at the Low End: Microexplorers Extremely limited energy sources => Peak computing only when absolutely necessary SENSORS COMMUNICATION TEMPERATURE CONTROL STRUCTURE ADVANCED MOBILITY COMPUTING POWER NAVIGATION 10 kg 1 kg 2002? 100 gm 2007? 10 gm 2012? 1997
“Larger” Systems Have More Diverse Energy/Performance Profiles RLV Hydrobot Nano-Spacecraft Integrated Inflatable Sailcraft Atmospheric Probes Nano-Rovers Distributed Sensors Penetrators
Recasting The Classical Power Equation Power = 1/2 x C x T x V2 Energy/sec Logic transitions/sec Energy/cycle x cycles/sec transitions/cycle x cycles/sec EnergyPerCycle = 1/2 x C x Na x V2 EPC is independent of clock rate! Lowering EPC is our focus!
Why is This Important? • Power = EPC x F • Performance = IPC x F • Today’s designs: Performance/Power = IPC/EPC • EPC & IPC are fixed at design time (other than voltage scaling) • THUS: Ratio is fixed at design time • Only runtime “knobs” are V and F • Real embedded scenarios: • Short periods of very high peak performance need => high IPC • Followed by long periods of much lower performance need • Result: long periods of lower performance still running at inefficient EPC!! F = cycles/second
This Project:A “Morphable” System Architecture • Today’s microarchitectures: EPC = IPCkwhere k>>1 • Our approach: • Inherently lower EPC (lower k) • With variable IPC (in turn varying EPC) • Thus IPC/EPC can be varied dynamically • Lowering IPC lowers EPC even more • Result: additional runtime “knobs” to run-time energy management • Adjust configuration so IPC x F matches performance needs • Reap energy savings of lower EPC Allow systems to change the “Energy Gear” on demand!
The Team • Overall Goals: • Architectures with variable IPC, EPC • Tools & S/W to manage morphing • Realistic demonstrations Peter Kogge Vincent Freeh Jay Brockman • UNIVERSITY • OF NOTRE DAME • Morphable multi-cluster architecture • “At the sense amps” ISA extension • Runtime with hooks for dynamic morphing control Kanad Ghose Energy Aware Data Placement • SUNY-BINGHAMTON • Morphable Caches, RFs • Energy Eff VLIW archs • Supporting compiler techniques • JET PROPULSION • LABORATORY • Scenarios & benchmarks • Baseline characterizations • Runtime adaptation algorithms Nikzad Toomarian Mohammed Mojarradi Savio Chau
Project Components • Morphable, inherently low EPC design • Memory system allowing both width and placement shaping • Dynamic algorithms to select best “shape” for current energy/performance profile • Augmented run-time to allow dynamic reconfiguration
Our Background • NSF MIPS: Inherently Low Power Architectures • The Multi-cluster microarchitecture • Cache-In-Memory • Energy Efficient Caches • IEEC Binghamton: Reducing power on interconnects • DARPA Processing-In-Memory Projects: HTMT & DIVA • Utilizing wide bandwidth on-chip storage macros • Data placement in deep memory hierarchies • Multi-threading • NASA • X2000: highly scalable low power systems for deep space missions • Evolvable Computing Program: adaptive algorithms to select system parameters to meet some mission objective
Starting A Solution:Multi Cluster Architecture (c) New Multi Cluster (a) Simple Pipeline (b) Classical Superscalar w(IW/w)k << (IW)k w Clusters Issue Width (IW) IW/w Problem: single large centralized register files with many ports Solution: multiple smaller register files with few ports EPC/IPC ~ (IW)k k as high as 1.9
Multi-Cluster vs Conventional Results Conventional 1x8 2x6 4x4 1x6 2x4 1x4 4x2 Up to 1/2 the energy at same IPC, or 20% better IPC at same energy
Insertion into PACC • Implement CPU as nominal 4 cluster configuration • Modify Instruction Issue to target variable # of clusters • Equivalent need for separating memory disambiguation units • Make this a runtime settable parameter • Unused clusters turned off • Additional CPU options • Implement selected subset of “wide word” & VLIW-like operations within a cluster • Utilize unused clusters for additional concurrent threads
Another Starting Point:Low Energy Caches & Register Files • Approach: exploit locality to reduce energy requirements of on-chip storage resources: • Example: multiple line buffers:
Storage System Morphs • Exploit locality to reduce dynamic AND static energy dissipations of on chip storage resources: • Selective substrate biasing to reduce leakage – reverse body bias removed when storage component is accessed • Clustered data placement to maximize access to each partition within on-chip and off-chip RAMs • Compiler/OS prefetching to avoid/reduce turn-on delay • Changeable Widths of Interconnect & Storage Resources • Sub-banking for caches and on-chip/off-chip RAM • FU-driven selection of activation width of dispatch buffer and reservation stations, data register files • Operand-width driven activation of FU slices
ISA Extensions with Energy Reduction Potential • VLIW-like multiple move instructions • Use compiler to optimize number of moves/energy • Useful for many signal processing loops, numerical computations • “Wide word” multiple operation per instruction • Utilize existing bandwidth more completely • Inclusion of simultaneous multi-threading extensions • Allow for pipelines without costly hazard detection/forwarding
Run-Time Considerations • Application must have freedom to provide • expected energy/performance of code • requests for levels of service • But, only run-time sees global picture • All current running applications & their requests • Existing energy/power resources and mission profiles • Measurements on current activities • Run-time modifications: changing the “energy gear” • Number of clusters per thread • Number of threads • Active width of on-chip storage resources & substrate biases • Active width of off-chip memory & interfaces • Placement of data within hierarchy
Determining the Gear:Reconfiguration Algorithms Outgrowth of JPL’s Evolvable Computing Program • Objective: • Develop reconfigurable computing capability which will allow: • Self-reconfiguration and adaptation to unforeseen conditions • Faster, cheaper development cycles • Approach: • Use powerful parallel searches (e. g. genetic algorithms, neural nets, etc.), possibly including hardware, to determine the optimal performance. • Payoff: • Achieve high autonomy on-board spacecraft • The best schedule for highest science return with lowest power consumption • Maintain functionality under changes in operating conditions
0 6 mo 1 yr 18 mo 2 yr Profiles Baseline Morphable Node Data Placement Adaptive Algorithms Run-time Demo & Eval Program Plan Optional 3rd year: high level design & demo on FPLA or MOSIS prototype of run-time investigation of needed program development environment demo in JPL test bed analysis for insertion into real JPL mission
Expected Deliverables • Benchmark suite & corresponding mission energy profiles • Detailed morphable architecture • System simulator with energy & performance projections & evaluation against profiles • Demonstration of data placement & architectural adaptation algorithms • Specification of energy aware run-time & API
Some Recent References • Zyuban, Victor and Peter M. Kogge, “Inherently Lower-Power High-Performance Superscalar Architectures,” submitted to IEEE Trans. On Computers • Zyuban, Victor and Peter M. Kogge, "Optimization of High-Performance Super-Scalar Architectures for Energy-Delay Product," accepted for ISPLED 2000 • K. Ghose, “Reducing Energy Requirements for Instruction Issue and Dispatch in Superscalar Processors”, accepted for ISLPED 2000 • K. Ghose and M. B. Kamble, “Reducing Power in Superscalar Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation”, ISPLED’99, pp. 70-75. • Zyuban, Victor and Peter M. Kogge, "The Energy Complexity of Register Files,” ISPLED’98, pp.305-310. • K. Ghose and M. B. Kamble “Energy-efficient Cache Organizations for Superscalar Processors”, Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98 • Zyuban, Victor and Peter M. Kogge, "Split Register File Architecture for Inherently Lower Power Architectures," Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98. • Zawodny, Jason T., Jay B. Brockman, Peter M. Kogge, Eric Johnson, "Cache-In-Memory: A Lower Power Alternative," Workshop on Power-Driven Microarchitecture, in conjunction with ISCA’98. • M.B. Kamble and K. Ghose, “Analytical Energy Dissipation Models for Low Power Caches, “ ISPLED’97, pp. 143-148. • M.B. Kamble and K. Ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,” IEEE 10-th. Int’l. Conf. on VLSI Design, Jan. 1997, pp. 261-267.