180 likes | 376 Views
Written by: Hameed et. Al – ISCA 2010 Presented by: David Schlais. Understanding Sources of Inefficiency in General-Purpose Chips. Motivations. Most modern systems are power critical Cell phones, servers, tablets, etc. ASIC Low power / Low flexibility Chip Multiprocessors (CMP)
E N D
Written by: Hameed et. Al – ISCA 2010 Presented by: David Schlais Understanding Sources of Inefficiency in General-Purpose Chips
Motivations • Most modern systems are power critical • Cell phones, servers, tablets, etc. • ASIC • Low power / Low flexibility • Chip Multiprocessors (CMP) • Higher Power / High Flexibility • Goal: Low power / High Flexibility • Paper attempts to identify what causes CMPpower inefficiencies
Application-Specific Integrated Circuits (ASICs) • ASIC • Application Specific • Performs specific task efficiently • Power • Area • Low overhead in order to ‘execute’ tasks
Chip Multiprocessors • General-Purpose • High overhead • Execution units in red • Rest is overhead • High Flexibility • Can perform many instructions http://pages.cs.wisc.edu/~david/courses/cs752/Fall2014/handouts/lecture/03_pipeline.pdf
And how do we minimize it? What is the energy gap between these two?
Exploring CMP vs ASIC gap • H.264 encoding (MPEG-4 advanced video coding) • Compared ASIC for H.264 encoding vs Tensilica CMP • Why H.264 encoding? • Large ASIC vs CMP gap • Easier to see inefficiencies in CMP • ASICs for H.264 are commercially available • Commercial products serve as benchmark • Contains variety of computational motifs • Results are applicable to larger set of applications 150-500x power gap
H.264 (MPEG-4 AVC) • 99% of tasks are summed up in 5 steps: • Integer Motion Estimation (IME) • Computes vector of image-block motion • Fractional Motion Estimation (FME) • Refines initial match to quarter-pixel resolution • Intra Prediction (IP) • Based on surrounding image-blocks, makes prediction • Transform and Quantization (DCT/Quant) • Determines difference between prediction and current • Context Adaptive Binary Arithmetic Coding (CABAC) • Encodes coefficients
CMP power breakdown Only ~6% of total energy Goes towards computation
Instruction Decode Logic • Decoding is ~30% of overall power • Repetitive decoding of similar instructions • Solution? • 16 and 18-way SIMD datapaths • Many Instructions can be done in parallel • Solution? • 2 and 3-slot VLIW instructions • Impact: • 10x performance increase • 10x energy decrease • Still ~30% of power to decoding
Work done per instruction Application Specific instructions RISC instructions
“Magic” Instructions • Desire: • Single instruction executes 100s of operations • Reuse/shift pixels to prevent repetitive x-1, x0, x1,x2, x3 loads • Requirements: • Hardware support • Custom data storage elements • Links to provide large amounts of data to these storage units • Path from storage units to functional units • Instruction support • New instruction to dictate when to do these operations
Takaways • Fetching instructions is expensive • Current Techniques only provide partial improvement • SIMD • VLIW • Processors have large overheads and small % computing • Power of FU alone exceeds ASIC designs • Purpose for automatic tools to create modified instructions
Takeaways Cont. • Application specific instructions and hardware reduce inefficiencies • “Instructions” performing hundreds of operations is ideal • Extensible processor is not enough • Current RISC architectures contain inefficiencies not present in ASIC designs • Still many areas to improve processor power & performance
Progress since 2010 • Neural acceleration • Esmaeilzadeh, Hadi, et al. "Neural acceleration for general-purpose approximate programs." Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012. • Dynamically Specialized datapaths • Govindaraju, Venkatraman, Chen-Han Ho, and KarthikeyanSankaralingam. "Dynamically specialized datapaths for energy efficient computing." High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on. IEEE, 2011. • Power-Efficient Compute-intensive GPGPU • Gilani, Syed Zulqarnain, Nam Sung Kim, and Michael J. Schulte. "Power-efficient computing for compute-intensive GPGPU applications." High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 2013.
Thank you • Questions?