Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012

Accelerator Compiler for the VENICE Vector Processor Zhiduo Liu Supervisor: Guy Lemieux Sep. 28th, 2012

Outline: Motivation Background Implementation Results Conclusion

FPGA VHDL Motivation Multi-core ParC Cilk Erlang System Verilog Verilog OpenMP OpenCL aJava SSE MPI Bluespec OpenGL Pthread GPU X10 CUDA StreamIt Sh OpenHMPP Many-core Fortress Sponge Chapel … Computer clusters Vector Processor

Simplification FPGA VHDL Motivation Multi-core ParC Cilk Erlang System Verilog Verilog OpenMP OpenCL aJava SSE MPI Bluespec OpenGL Pthread GPU X10 CUDA StreamIt Sh OpenHMPP Many-core Fortress Sponge Chapel … Computer clusters Vector Processor

Motivation Single Description …

Contributions The compiler serves as a new back-end of a single-description multiple-device language. The compiler makes VENICE easier to program and debug. The compiler provides auto-parallelization and optimization. [1] Z. Liu, A. Severance, S. Singh and G. Lemieux, “Accelerator Compiler for the VENICE Vector Processor,” in FPGA 2012. [2] C. Chou, A. Severance, A. Brant, Z. Liu, S. Sant, G. Lemieux, “VEGAS: soft vector processor with scratchpad memory,” in FPGA 2011.

Complicated ALIGN WR RD ALIGN EX1 EX2 ACCUM

#include "vector.h“ int main() { int A[] = {1,2,3,4,5,6,7,8}; const int data_len = sizeof ( A ); int *va = ( int *) vector_malloc ( data_len ); vector_dma_to_vector ( va, A, data_len ); vector_wait_for_dma (); vector_set_vl ( data_len / sizeof (int) ); vector ( SVW, VADD, va, 42, va ); vector_instr_sync(); vector_dma_to_host ( A, va, data_len ); vector_wait_for_dma (); vector_free (); } Program in VENICE assembly • Allocate vectors in scratchpad • Move data from main memory to scratchpad • Wait for DMA transaction to be completed • Setup for vector instructions • Perform vector computations • Wait for vector operations to be completed • Move data from scratchpad to main memory • Wait for DMA transaction to be completed • Deallocate memory from scratchpad

Program in Accelerator • Create a Target • Create Parallel Array objects • Write expressions • Call ToArray to evaluate expressions • Delete Target object #include "Accelerator.h" using namespace ParallelArrays; using namespace MicrosoftTargets; int main() { int A[] = {1,2,3,4,5,6,7,8}; Target *tgt = CreateVectorTarget(); IPA b = IPA( A, sizeof (A)/sizeof (int)); IPA c = b + 42; tgt->ToArray( c, A, sizeof (A)/sizeof (int)); tgt->Delete(); } Target *tgt = CreateMulticoreTarget(); Target *tgt= CreateDX9Target();

Assembly Programming : Accelerator Programming : Write in Accelerator Write Assembly Compile with Microsoft Visual Studio Doesn’t compile? Or result incorrect? Compile with Gcc Compile with Gcc Doesn’t compile? Download to board Download to board Get Result Get Result Result Incorrect?

Assembly Programming : • Hard to program • Long debug cycle • Not portable • Manual – Not always optimal or correct (wysiwyg) • Accelerator Programming : • Easy to program • Easy to debug • Can also target other devices • Automated compiler optimizations

D #include "Accelerator.h" using namespace ParallelArrays; using namespace MicrosoftTargets; int main() { Target *tgtVector = CreateVectorTarget(); const int length = 8192; int a[] = {1,2,3,4, … , 8192}; int d[length]; IPA A = IPA( a, length); IPA B = Evaluate( Rotate(A, [1]) + 1 ); IPA C = Evaluate( Abs( A + 2 )); IPA D = ( A + B ) * C ; tgtVector->ToArray( D, d, length * sizeof(int)); tgtVector->Delete(); } × Abs + + A + 2 A 1 Rot A

D × Abs + + A + 2 A 1 Rot A

D × Abs + + A + 2 A 1 A (rot)

C B D Abs + × 1 + A (rot) Abs + 2 A D + A + × 2 A 1 A (rot) C + A B

C Combine Operations Abs + B 2 A D + × 1 A (rot) C + A B

C Combine Operations |+| A 2 B D + × 1 A (rot) C + A B

Scratchpad Memory “Virtual Vector Register File”

“Virtual Vector Register File”

“Virtual Vector Register File” Number of vector registers = ? Vector register size = ?

C Evaluation Order B + 2 A (rot) + 5 2 D 1 A (rot) 1 3 1 3 1 4 2 3 × 0 0 1 2 1 2 1 1 C + 1 1 2 1 A B

C Count number of virtual vector registers B + 2 A (rot) + D 1 A (rot) × C + A B

Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012

Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012

Presentation Transcript

28 Sep 2009

28 th June 2012

Madrid, 28 th March 2012

Tuesday, August 28 th , 2012

Singapore, 24 th to 28 th September 2012

Madrid, 27 th Sep 2012

Tuesday Sep. 28, 2010

8 th FSP Meeting 28 th June 2012

February 28 th , 2012

11 Sep 2012

Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux, Shahriar Mirabbasi, Patrick Palmer

Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux , Shahriar Mirabbasi, William Dunford

Anthony J. Yu Guy G.F. Lemieux August 25, 2005

Anthony J. Yu Guy G.F. Lemieux September 15, 2005

February 28 th , 2012

Zhiduo Liu Aaron Severance Satnam Singh Guy Lemieux

Sep 28 th – Tuesday Sep 29 th – Wednesday 10am to 8pm

Provider Update 28 th February 2012

Sep 28, 2011

Guy Lemieux , Mehdi Alimadadi, Samad Sheikhaei, Shahriar Mirabbasi