1 / 17

To DSP or Not to DSP?

To DSP or Not to DSP?. Chad Erven. Words to Bits – Your Options. ASIC FPGA DSP Embedded RISC General Purpose Processor (GPP). Why Go Programmable?. Building the chip wrong Systems are increasingly too complex to efficiently be described by RTL designers

konane
Download Presentation

To DSP or Not to DSP?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. To DSP or Not to DSP? Chad Erven

  2. Words to Bits – Your Options • ASIC • FPGA • DSP • Embedded RISC • General Purpose Processor (GPP)

  3. Why Go Programmable? • Building the chip wrong • Systems are increasingly too complex to efficiently be described by RTL designers • Errors are orders of magnitudes more difficult to find in hardware than software • Defects are extremely costly in hardware • Building the wrong chip • Only software is flexible enough to adapt during and after system design HARDWARE IS TO HARD!

  4. So Software and Processors, Right? • Using processors has its drawbacks – especially in SOC designs • Never a perfect match between the application and the hardware • Performance costs, power penalties, wasted silicon will ALWAYS happen to some extent • Integrating multiple disparate cores with each other

  5. Splitting the Difference – ASIPs • Ever wish you were the processor designer? • Now you are! Write the exact instructions you need and nothing more. • An Application Specific Integrate Processor (ASIP) offers the best of both worlds

  6. Back Up! • Isn’t hardware too much work? • Yes • So doesn’t an ASIP defeat the purpose? • No • Why not? • Extending a base processor is much easier • Readily amiable to automation • You only have to verify the instruction description, integration into the processor is guaranteed

  7. Cool, Show Me How It Works • ASIPs derive their performance from three problems for a processor • Operations that are innately parallel must be expressed serially • Somewhat solved by SIMD or MIMD processors • Memory space is addressed as one continuous space • Somewhat solved by modifiers and/or pragmas (dm/pm) • Applications are complicated by their expression as operations on C types • Somewhat alleviated by powerful instructions in hardware

  8. Working with the Innate Nature of the Algorithm • Example –byte swap (common telecom task) int *a, *b ; … for(int i= 0 ; i < 4096 ; i++ ) { a[i] =( ((b[i] & 0x000000ff) << 24) | ((b[i] & 0x0000ff00) << 8) | ((b[i] & 0x00ff0000) >> 8) | ((b[i] & 0xff000000) >> 24) ); }

  9. Working with the Innate Nature of the Algorithm • Write your own instruction: operation swap {in AR x, out AR y}{} {y = {x[7:0],x[15:8],x[23:16],x[31:24]};} • Making the C Code: for(int i = 0 ; i < 4096 ; i++) a[i] = swap(b[i]) ; 5X SPEED UP!!!

  10. reg5(output) reg5 (output) op2 op2 reg3 (input) reg4 (input) reg4 (input) reg3 (output) op1 op1 reg1 (input) reg2 (input) reg1 (input) reg2 (input) Unfused operation Fused operation Instruction Fusion

  11. Example for(i=0 ; i<n ; i++ ) c[i] = (a[i] * b[i]) >> 4 ; Assembly: loop: l8ui a12,a11,0 l8ui a13,a10,0 addi a11,a11,1 addi a10,a10,1 mull6u a8,a12,a13 srai a8,a8,4 s8i a8,a9,0 addi a9,a9,1

  12. Example 1 a11 1 0 0 a10 addi l8ui l8ui addi mull6u 4 srai a9 1 s8i addi a9

  13. Example 1 a11 1 0 0 a10 addi l8ui l8ui addi a9 fusion.mull6u.srai.s8i.addi a9

  14. Example New assembly code: loop: l8ui a12,a11,0 l8ui a13,a10,0 addi a10,10,1 addi a11,a11,1 fusion.mull6u.srai.s8i.addi a9,12,a13

  15. Benchmarking EEMBC ConsumerMarks (performance). From [Rowen] . EEMBC Summary (Performance/MHz). From [Rowen] • Hand coded assembly for the other processors

  16. And I Haven’t Even Gotten To… • Sharing input operands • Substituting variables with constants • Replacing memory tables with logic • Limiting immediate values to the minimum required width • Placing operands in special registers • Creating SIMD instructions • Reducing the size of operand specifiers • Custom input/output queues

  17. Ok, Let Me Have It Dr. Smith (The rest of you can ask questions too)

More Related