1 / 28

Single-Chip Heterogeneous Computing Does the Future Include Custom Logics, FPGA, and GPGPUs?

Single-Chip Heterogeneous Computing Does the Future Include Custom Logics, FPGA, and GPGPUs?. Presented by Kittisak Sajjapongse. Introduction to the study. Objective of the study. Observe the trends of integrating unconventional cores (U-cores) into single-chip multicores

xannon
Download Presentation

Single-Chip Heterogeneous Computing Does the Future Include Custom Logics, FPGA, and GPGPUs?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single-Chip Heterogeneous ComputingDoes the Future Include Custom Logics, FPGA, and GPGPUs? Presented by KittisakSajjapongse

  2. Introductiontothe study

  3. Objective of the study • Observe the trends of integrating unconventional cores (U-cores) into single-chip multicores • Identify the factors that impact decision to have U-cores Introduction to the study

  4. Model in the study Symmetric - Multiple fast complex cores (FastCore) - Highly optimized to minimize latency of single thread • Asymmetric • One fast complex core (FastCore) • Multiple simple cores (BCE) • Intended to handle application which has parallelism • Heterogeneous • One fast complex core (FastCore) • U-cores: ASICs, FPGAs, GPGPUs • We are going to study about U-cores Introduction to the study

  5. ASIC, FPGA, and GPGPU • ASIC(Application-Specific Integrated Circuit) • A device or integrated circuit customized for specific application domains e.g. H264 codec, JPEG codec etc. • FPGA(Field Programmable Gate Array) • A configurable digital integrated circuit capable for supporting hardware architectures • GPGPU(General-Purpose Graphic Processing Unit) • Graphics devices that provides APIs (Application Programming Interface) for using with parallelizable application Introduction to the study

  6. ASIC, FPGA, and GPGPU They all are used to exploit parallelism!!! Introduction to the study

  7. What is the study about ? • Constains • Power • Bandwidth • Questions posed Under bandwidth- and power- constrains • Would single-chip multicores benefit significatly from U-cores ? • Would ASICs be the best choice ? Introduction to the study

  8. Model for U-core

  9. What is BCE? • Baseline Core Equivalent • Referred to a basic processor • Used as baseline reference for performance and power consumption Model for U-core

  10. What is BCE? • Two parameters used later • n : number of total BCE available • r : number of resources dedicated to complex cores (in a unit of BCE) Model for U-core

  11. Amdahl’s Law Reference: http://en.wikipedia.org/wiki/Amdahl_law Model for U-core

  12. Hill & Marty’s extended Amdahl’s Law Reference: M. D. Hill et al., “Amdahl’s Law in the Multicore Era,” Computer Model for U-core

  13. How about Heterogeneous arch.? ? SpeedupHeterogeneous (??)= ??? Under Power & Bandwidth constrains Model for U-core

  14. Deriving model for U-core SpeedupAmdahl = f(f,n) SpeedupHill&Marty= f(f,n,r) SpeedupHet.(U-core) = f(f,n,r,B,P,µ,φ) New Parameters: B – Memory Bandwidth of U-core (in unit of BCE compulsory bandwidth) P – Active Power of U-core relative to BCE µ – Performance of U-core relative to BCE Φ – Power efficiency of U-core relative to BCE Model for U-core

  15. Deriving model for U-core 1 Speeduphet(U-core)= Speedupasym(offload)= Speedupasymmetric= 1-f f + perf(r) perf(r) + µ( n - r n - r ) Model for U-core

  16. Obtaining µ,φ for U-core

  17. Devices & Workload Device: Workload: - Dense Matrix Multiplication (MMM) - Fast Fourier Transform (FFT with various input size 24 to 220) - Black-Scholes (BS) Obtaining µ,φ for U-core

  18. Deriving µ for ASIC in FFT-1024(case study) 350 0.5

  19. Deriving φ for ASIC in FFT-1024(case study) 100 0.8

  20. Obtained Parameters Obtaining µ,φ for U-core

  21. Applying the Model for Results

  22. Scaling Projection

  23. Budget and Constrains

  24. Result for FFT-1024

  25. Results for MMM

  26. Results for Black-Scholes

  27. Answering the questions • Would single-chip multicores benefit significatly from U-cores ? • Yes , If the application has enough (>90%) parallelism to exploit. • Would ASICs be the best choice ? • Depends on applications, if there is not much parallelism, then ASIC might not be worth to implement.

  28. Conclusions • Sufficient parallelism must exists to significantly obtain performance improvement from U-core • Flexible U-cores tend to be competitive to ASIC under limited bandwidth and limited parallelism • U-core such as ASIC is useful when power is the primary goal

More Related