1 / 17

Understanding Sources of Inefficiency in General-Purpose Chips

Written by: Hameed et. Al – ISCA 2010 Presented by: David Schlais. Understanding Sources of Inefficiency in General-Purpose Chips. Motivations. Most modern systems are power critical Cell phones, servers, tablets, etc. ASIC Low power / Low flexibility Chip Multiprocessors (CMP)

mandana
Download Presentation

Understanding Sources of Inefficiency in General-Purpose Chips

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Written by: Hameed et. Al – ISCA 2010 Presented by: David Schlais Understanding Sources of Inefficiency in General-Purpose Chips

  2. Motivations • Most modern systems are power critical • Cell phones, servers, tablets, etc. • ASIC • Low power / Low flexibility • Chip Multiprocessors (CMP) • Higher Power / High Flexibility • Goal: Low power / High Flexibility • Paper attempts to identify what causes CMPpower inefficiencies

  3. Application-Specific Integrated Circuits (ASICs) • ASIC • Application Specific • Performs specific task efficiently • Power • Area • Low overhead in order to ‘execute’ tasks

  4. Chip Multiprocessors • General-Purpose • High overhead • Execution units in red • Rest is overhead • High Flexibility • Can perform many instructions http://pages.cs.wisc.edu/~david/courses/cs752/Fall2014/handouts/lecture/03_pipeline.pdf

  5. And how do we minimize it? What is the energy gap between these two?

  6. Exploring CMP vs ASIC gap • H.264 encoding (MPEG-4 advanced video coding) • Compared ASIC for H.264 encoding vs Tensilica CMP • Why H.264 encoding? • Large ASIC vs CMP gap • Easier to see inefficiencies in CMP • ASICs for H.264 are commercially available • Commercial products serve as benchmark • Contains variety of computational motifs • Results are applicable to larger set of applications 150-500x power gap

  7. H.264 (MPEG-4 AVC) • 99% of tasks are summed up in 5 steps: • Integer Motion Estimation (IME) • Computes vector of image-block motion • Fractional Motion Estimation (FME) • Refines initial match to quarter-pixel resolution • Intra Prediction (IP) • Based on surrounding image-blocks, makes prediction • Transform and Quantization (DCT/Quant) • Determines difference between prediction and current • Context Adaptive Binary Arithmetic Coding (CABAC) • Encodes coefficients

  8. CMP power breakdown Only ~6% of total energy Goes towards computation

  9. Instruction Decode Logic • Decoding is ~30% of overall power • Repetitive decoding of similar instructions • Solution? • 16 and 18-way SIMD datapaths • Many Instructions can be done in parallel • Solution? • 2 and 3-slot VLIW instructions • Impact: • 10x performance increase • 10x energy decrease • Still ~30% of power to decoding

  10. Work done per instruction Application Specific instructions RISC instructions

  11. “Magic” Instructions • Desire: • Single instruction executes 100s of operations • Reuse/shift pixels to prevent repetitive x-1, x0, x1,x2, x3 loads • Requirements: • Hardware support • Custom data storage elements • Links to provide large amounts of data to these storage units • Path from storage units to functional units • Instruction support • New instruction to dictate when to do these operations

  12. Results

  13. Results

  14. Takaways • Fetching instructions is expensive • Current Techniques only provide partial improvement • SIMD • VLIW • Processors have large overheads and small % computing • Power of FU alone exceeds ASIC designs • Purpose for automatic tools to create modified instructions

  15. Takeaways Cont. • Application specific instructions and hardware reduce inefficiencies • “Instructions” performing hundreds of operations is ideal • Extensible processor is not enough • Current RISC architectures contain inefficiencies not present in ASIC designs • Still many areas to improve processor power & performance

  16. Progress since 2010 • Neural acceleration • Esmaeilzadeh, Hadi, et al. "Neural acceleration for general-purpose approximate programs." Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012. • Dynamically Specialized datapaths • Govindaraju, Venkatraman, Chen-Han Ho, and KarthikeyanSankaralingam. "Dynamically specialized datapaths for energy efficient computing." High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on. IEEE, 2011. • Power-Efficient Compute-intensive GPGPU • Gilani, Syed Zulqarnain, Nam Sung Kim, and Michael J. Schulte. "Power-efficient computing for compute-intensive GPGPU applications." High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 2013.

  17. Thank you • Questions?

More Related