150 likes | 325 Views
Ideas for the design of an ASIP for LQCD. Target Compiler Technologies CASTNESS’11, Rome, Italy. Agenda. ASIPs and IP Designer EURETILE platform An ASIP for LQCD. ASIPs in Multi-Core SoC. ASIP: Application-Specific Processor Anything between general-purpose P and hardwired data-path
E N D
Ideas for the design of an ASIP for LQCD Target Compiler TechnologiesCASTNESS’11, Rome, Italy
Agenda • ASIPs and IP Designer • EURETILE platform • An ASIP for LQCD
ASIPs in Multi-Core SoC • ASIP: Application-Specific Processor • Anything between general-purpose P and hardwired data-path • Flexibility through programmability and design-time reconfigurability • High throughput, low energy through parallelism and specialization • ASIP is foundation of heterogeneous multi-core SoC • Balanced SoC architecture offers best performance at lowest energy and lowest cost
Why ASIPs? • Maximise performance • Specialisation • Parallelism: VLIW, SIMD, multi-core • Minimise power dissipation • Specialisation • Parallelism: VLIW, SIMD, multi-core • Power-optimised RTL generation • Leverage the benefits of programmability • React to changing requirements • Ship first for evolving standards • Remedy defects • Extend products to new markets without an SoC respin
Structural skeleton Instruction-set grammar nML – ASIP description language • Example: architectural specialisation • Absolute-difference instruction in motion estimation regV[4]<vector>; trn vecr<vector>; trnvecs<vector>; trnvecd<vector>; trnvect<vector>; fuvec; fuvabs; ... opn vec_adiff_opn(t:c2u, r:c2u) { action { stage E1: vecd = vsub(vecr=V[r],vecs=V[t]) @vec; V[t] = vect = vabs(vecd) @vabs; } syntax : "vadiff v"t ",v"r ",v"t; image : t::r; } • Registers, busses, functional units • Application specific data type ‘vector’ • Primitive functions: • vsub() • vabs() Operation pattern: V vabs() vsub() V, V
Agenda • ASIPs and IP Designer • EURETILE platform • An ASIP for LQCD
EURETILE hardware platform DSP ASIP1 DNP MEM • Communication • DNP • Control • RISC • Computation • DSP • ASIPs: specialised towards the application • Lattice quantum chromo dynamics (LQCD) • Neural network (Izhikevich) RISC ***
Agenda • ASIPs and IP Designer • EURETILE platform • An ASIP for LQCD
LQCD ASIP • Goals • Increase performance • Decrease gate count or usage of FPGA blocks • Means • Task level parallelism (multi tile architecture) • Data level parallelism • Instruction level parallelism • Architecture specialisation
LQCD ASIP • Data level parallelism • 3-way SIMD fits with SU(3) matrix algebra • 3x speed improvement over scalar architecture • Instruction level parallelism • VLIW instruction word • Arithmetic operations in parallel with load/store operations • Appropriate mix of n and m based on feedback from compilation of Qphi() function • n*m speed improvement over scalar architecture
LQCD ASIP • Architecture specialisation: complex floating point operations: C + C, C + i*C → 2x speedup over scalar architecture C – C, C – i*C C * R → 4x speedup over scalar architecture C * C → 8x speedup over scalar architecture … • Behaviour of floating point operations • Defined in a C dialect intended for the modelling of functional units • Translated into simulation and implementation (RTL) models • Synthesis on standard cell library, mapping on FPGA primitives • Vector types and operators defined for the C compiler vector v1, va[4], vb[4]; v1 += va[0] * vb[1];
LQCD ASIP • Architecture specialisation: address generation • Goal: Vector units should be used every cycle, address generation must be done in parallel • How: to be investigated, after feedback from C compiler! • Deliverables • SDK (Compiler, Assembler, Linker, Simulator, Debugger) based on IP Designer • SystemC model • RTL Model + FPGA mapping