270 likes | 421 Views
No Free Lunch, No Hidden Cost. X. Sharon Hu Dept. Computer Science and Engineering University of Notre Dame. How Can Co-Design Help?. The Salishan Conference on High-Speed Computing. 1. 1. Department of Computer Science and Engineering. Theme: Exposing Hidden Execution Costs.
E N D
No Free Lunch, No Hidden Cost X. Sharon Hu Dept. Computer Science and Engineering University of Notre Dame How Can Co-Design Help? The Salishan Conference on High-Speed Computing 1 1 Department of Computer Science and Engineering
Theme: Exposing Hidden Execution Costs • Cost of execution: performance and power • Computation • Communication • Data motion • Synchronization • … • How can we strike a balance between the extremes? • Hide as much as possible? • Explicitly manage “all” costs? • My “position”: • Expose widely and choose wisely • Focus on power
Why Taking the Position? • Expose widely • Better understanding the contribution by each component • Allowing application-specific tradeoffs • Providing opportunities for powerful co-design tools • Choose wisely • Requiring sophisticated co-design tools • Exploring more algorithm/software options
But Easier Said Than Done! • Heterogeneity • Compute nodes: (multi-core) CPU, GP-GPU, FPGA, … • Memory components: on-chip, on-board, disks, … • Communication infrastructure: bus, NoC, networks, … • Parallelism (”non-determinism”) • Data access: movement, coherence, … • Resource contention • synchronization
Outline • Why expose widely? • How to benefit from exposing widely? • How to choose wisely? • Going forward
Why Expose Widely? (1) • Different programs has different power distribution GPU Power Distribution (NVidia GTX 280) GPU Cores ConstCache Memory ConstSM TextCache } Hong and Kim, ISCA 2010
Why Expose Widely? (2) • Data movement impacts different algorithms differently Energy consumptions of three sorting algorithms (Pentium 4 + GeForce 570)
Why Expose Widely? (3) • Application dependent Performance degradation due to memory bus contention Massaki Kondo, et. al., SigARCH 2007
Outline • Why expose widely? • How to benefit from exposing widely? • How to choose wisely? • Going forward
How to Benefit from “Exposing Widely”? • Co-design is the key • Expose all factors impacting the “execution model” • Computation: processing resource • Data motion: memory components and hierarchy • Communication: bus and network • Resource contention, synchronization… • Some examples • Software macromodeling • Hardware module-based modeling • Optimize through power management • Keep in mind Amdahl’s law
Macromodeling: Algorithm Complexity Based • Relate power/energy of a program with its complexity • Example: E = C1S + C2S2 + C3S3 (Tan, et. al. DAC’01) where S is the size of the array for a sorting algorithm • Example: Ecomm = C0 + C1S (Loghi, et. al. ACMTECS’07) where S is the size of exchanged messages • More sophisticated models to account for both computing and communication • How to handle resource contention?
Power Modeling of Bus Contension • Penolazzi, Sander and Ahmed Hemani: DATE’11 • Characterization step • C%N,1 : percentage of cycle difference between the N-processor case and 1-processor case • Can be one by IP providers on chosen benchmarks • Prediction step
Hierarchical Module-Based Power Modeling • Accumulate energy/power of modules • CPU+GPU example • Access rate: software dependent • Data movement contributes to memory power • Resource contention modifies access rate Adapted from Isci and Martonosi, Micro’03
Outline • Why expose widely? • How to benefit from exposing widely? • How to choose wisely? • Going forward
Managing Bus Contention to Reduce Energy • M. Kondo, H. Sasaki and H. Nakamura, 2006 • Counter for mem request • Register for PU identification • Thresholds for selecting which PU uses what Vdd value
Application Mapping to Reduce Energy (1) • Application mapping for heterogeneous systems ([minR2,maxR2], D2) ([minR1,maxR1], D1) PE 2 PE 1 J1 J2 PE 4 PE 3 J3 J4 ([minR3,maxR3], D3) ([minR4,maxR4], D4) Memory R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.
Application Mapping to Reduce Energy (2) • Optimization: • Minimize power/energy dissipation • Satisfying timing properties (e.g. average path latency, average lateness, etc.) • … • Search Space: • Scheduling parameter, traffic shaping, … • Task level DVFS, i.e. task speed assignment • Resource level DVFS, i.e., resource speed assignment • …
Application Mapping (3): Sensitivity Analysis R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.
Application Mapping (4): GA-Based Approach 2’. Scheduling Trace 3’. Power Dissipation Power Analyzer Power model needed
Outline • Why expose widely? • How to benefit from exposing widely? • How to choose wisely? • Going forward
Going Forward: Systematic Co-design Effort • Expose more • More hardware counters / registers • More efficient/accurate high-level power models • Better models for resource contention and synchronization • Choose better • Handling parallelism • Algorithm, OS, hardware • Resource contention • synchronization • Handling non-determinism • Worst case bounds • Statistical analysis • Interval-based techniques
ES Design v.s. HPCS Design • Differences (maybe) • Application specific workloads v.s. domain specific workloads • Constraints, objectives, desirables? • latency, throughput, energy, cost, reliability, fault tolerance, IP protection/privacy, ToM, … • Other issues: homogeneous v.s. heterogeneous, levels of complexity, user expertise,… • Similarities • Ever increasing hardware capability: multi-core, multi-thread, complex communication fabrics, memory hierarchy, … • Productivity gap • Common concerns: latency, throughput, energy, cost, reliability, fault tolerance, …
Leverage Co-Design for HPC • Systematic performance estimation • Formal methods: scenario-based, statistical analysis • Hybrid approaches: analytical+simulation • Seamless migration from one abstraction level to the next • Efficient design space exploration • Efficient search techniques • Multiple-level abstraction models • Multiple-attribute optimization • Others: memory and communication analysis and design