1 / 146

Eliminating the Hardware/Software Divide

Eliminating the Hardware/Software Divide. Satnam Singh, Microsoft Research Cambridge, UK. !. IRQ, NMI. t. locks monitors condition variables spin locks priority inversion. multiple independent multi-ported memories. hard and soft embedded processors. fine-grain parallelism and

tassos
Download Presentation

Eliminating the Hardware/Software Divide

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eliminating the Hardware/Software Divide Satnam Singh, Microsoft Research Cambridge, UK

  2. ! IRQ, NMI

  3. t

  4. locks monitors condition variables spin locks priority inversion

  5. multiple independent multi-ported memories hard and soft embedded processors fine-grain parallelism and pipelining

  6. LUTs are just higher order functions i3 i2 i1 i2 i1 o o o i o i1 i0 i0 i0 lut3 lut1 lut2 lut4 inv = lut1 notand2 = lut2 (&&) mux = lut3 (ls d0 d1 . if s then d1 else d0)

  7. 14820 sim-adds 1,037,400,000,000 additions/second 32-bit integer Adder (32/474,240) >700MHz 332x1440 XC6VLX760 758,784 logic cells, 864 DSP blocks, 1,440 dual ported 18Kb RAMs

  8. XD2000i FPGA in-socket accelerator for Intel FSB XD2000F FPGA in-socket accelerator for AMD socket F XD1000 FPGA co-processor module for socket 940

  9. Case Study – Spam Filtering (Alessandro Forin, MSR Redmond) • Benchmark • ~50,000 regular expressions fromForefront Team (snapshot fromtheir Exchange server in Aug ‘09) • Performance • Up to 6000x faster than standard Intel processors • Capable of processing at line rate of gigabit Ethernet • Power Requirement • 7 – 10 watts rather than 200++ watts

  10. Software Version FPGA Version “E-mail Server” “E-mail Server” ~1 Message/Sec ~6000 Messages/Sec Reg Ex Processing <10 Watts 200++ Watts Reg Ex Processing

  11. René Müller (ETH) FPGAs + SQL [VLDB]

  12. CPU FPGA

  13. 541 seconds 1896 seconds

  14. scientific computing data mining search image processing financial analytics opportunity challenge

  15. The Accidental Semi-colon ;

  16. publicstaticint[] SequentialFIRFunction(int[] weights, int[] input) { int[] window = newint[size]; int[] result = newint[input.Length]; // Clear to window of x values to all zero. for (int w = 0; w < size; w++) window[w] = 0; // For each sample... for (inti = 0; i < input.Length; i++) { // Shift in the new x value for (int j = size - 1; j > 0; j--) window[j] = window[j - 1]; window[0] = input[i]; // Compute the result value int sum = 0; for (int z = 0; z < size; z++) sum += weights[z] * window[z]; result[i] = sum; } return result; }

  17. PLDI 1998

  18. PLDI 2003

  19. PLDI 2010

  20. POPL 1998

  21. POPL 2002

  22. POPL 2010

  23. ray of light Signal Esterel SHIM Accelerator RapidMind /Ct Streams-C Bluespec Liquid Metal Feldspar PRET-C

  24. embedded DSLs high level software machine learning universal language? GPU FPGA DSP Gannet grand unification theory polygots

  25. Our High Level Synthesis Projects Kiwi: concurrent C# programs for control-oriented applications [David Greaves,Univ. Cambridge] shape analysis: synthesis of dynamic data structures (C) [MPI and CMU] Accelerator/FPGA: synthesis of data parallel programs in C++/C#/F# [MSR Redmond] HLINQ eDSLs [Gavin Bierman] + compilation of self-recursive Haskell functions to FPGA circuits!

  26. Redmond Accelerator Team Barry Bond Kerry Hammil Lubomir Litchev <anonymous other person>

  27. Effort vs. Reward CUDA OpenCL HLSL DirectCompute Accelerator low effort low reward medium effort medium reward high effort high reward

  28. Accelerator

More Related