1 / 15

Ideas for the design of an ASIP for LQCD

Ideas for the design of an ASIP for LQCD. Target Compiler Technologies CASTNESS’11, Rome, Italy. Agenda. ASIPs and IP Designer EURETILE platform An ASIP for LQCD. ASIPs in Multi-Core SoC. ASIP: Application-Specific Processor Anything between general-purpose  P and hardwired data-path

taffy
Download Presentation

Ideas for the design of an ASIP for LQCD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ideas for the design of an ASIP for LQCD Target Compiler TechnologiesCASTNESS’11, Rome, Italy

  2. Agenda • ASIPs and IP Designer • EURETILE platform • An ASIP for LQCD

  3. ASIPs in Multi-Core SoC • ASIP: Application-Specific Processor • Anything between general-purpose P and hardwired data-path • Flexibility through programmability and design-time reconfigurability • High throughput, low energy through parallelism and specialization • ASIP is foundation of heterogeneous multi-core SoC • Balanced SoC architecture offers best performance at lowest energy and lowest cost

  4. Why ASIPs? • Maximise performance • Specialisation • Parallelism: VLIW, SIMD, multi-core • Minimise power dissipation • Specialisation • Parallelism: VLIW, SIMD, multi-core • Power-optimised RTL generation • Leverage the benefits of programmability • React to changing requirements • Ship first for evolving standards • Remedy defects • Extend products to new markets without an SoC respin

  5. IP Designer Tool Suite

  6. Structural skeleton Instruction-set grammar nML – ASIP description language • Example: architectural specialisation • Absolute-difference instruction in motion estimation regV[4]<vector>; trn vecr<vector>; trnvecs<vector>; trnvecd<vector>; trnvect<vector>; fuvec; fuvabs; ... opn vec_adiff_opn(t:c2u, r:c2u) { action { stage E1: vecd = vsub(vecr=V[r],vecs=V[t]) @vec; V[t] = vect = vabs(vecd) @vabs; } syntax : "vadiff v"t ",v"r ",v"t; image : t::r; } • Registers, busses, functional units • Application specific data type ‘vector’ • Primitive functions: • vsub() • vabs() Operation pattern: V vabs()  vsub()  V, V

  7. Agenda • ASIPs and IP Designer • EURETILE platform • An ASIP for LQCD

  8. EURETILE hardware platform DSP ASIP1 DNP MEM • Communication • DNP • Control • RISC • Computation • DSP • ASIPs: specialised towards the application • Lattice quantum chromo dynamics (LQCD) • Neural network (Izhikevich) RISC ***

  9. Agenda • ASIPs and IP Designer • EURETILE platform • An ASIP for LQCD

  10. LQCD ASIP • Goals • Increase performance • Decrease gate count or usage of FPGA blocks • Means • Task level parallelism (multi tile architecture) • Data level parallelism • Instruction level parallelism • Architecture specialisation

  11. LQCD ASIP • Data level parallelism • 3-way SIMD fits with SU(3) matrix algebra • 3x speed improvement over scalar architecture • Instruction level parallelism • VLIW instruction word • Arithmetic operations in parallel with load/store operations • Appropriate mix of n and m based on feedback from compilation of Qphi() function • n*m speed improvement over scalar architecture

  12. LQCD ASIP • Architecture specialisation: complex floating point operations: C + C, C + i*C → 2x speedup over scalar architecture C – C, C – i*C C * R → 4x speedup over scalar architecture C * C → 8x speedup over scalar architecture … • Behaviour of floating point operations • Defined in a C dialect intended for the modelling of functional units • Translated into simulation and implementation (RTL) models • Synthesis on standard cell library, mapping on FPGA primitives • Vector types and operators defined for the C compiler vector v1, va[4], vb[4]; v1 += va[0] * vb[1];

  13. LQCD ASIP • Architecture specialisation: address generation • Goal: Vector units should be used every cycle, address generation must be done in parallel • How: to be investigated, after feedback from C compiler! • Deliverables • SDK (Compiler, Assembler, Linker, Simulator, Debugger) based on IP Designer • SystemC model • RTL Model + FPGA mapping

More Related