270 likes | 283 Views
Synthesis with Sketching. Armando Solar-Lezama UC Berkeley
E N D
Synthesis with Sketching Armando Solar-Lezama UC Berkeley Gilad Arnold, Ras Bodik, Bob Brayton, Chris Jones, Alan Mishchenko, Armando Solar-Lezama, Koushik Sen, Sanjit Seshia, Liviu Tancau UC Berkeley Vijay Saraswat, Satish Chandra, Eran Yahav IBM Mooly Sagiv Tel Aviv University
Synthesis The promise: automate program development 2
Insight Big picture Strategy Exhaustive exploration Details Tactics Challenge Establish a synergy between synthesizer and user
Key Observation Insight and Mechanics are both reflected in the source code The Sketch solution: • Write only the code corresponding to insight • Let the synthesizer derive the mechanics
A stencil: spec t i voidsten1d (float[4,N] X) { for(int t=1; t<4; ++t) for(int i=1; i<N-1; ++i) X[t, i] = X[t-1, i-1] + X[t-1, i+1]; } 5
Fast implementation t i voidsten1dSK (float[4,N] X) { assume ( N >= 3 ) for(int i= 0; i<4 ; ++i) for(int t=1; t<i ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=4; i<N; ++i) for(int t=1; t<4; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=N; i<N+4; ++i) for(int t=i-N+2; t<4; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; } 6
What are the hard fragments? t i voidsten1dSK (float[4,N] X) { assume ( N >= 3 ); for(int i= 0; i<4 ; ++i) for(int t=1; t<i ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=4; i<N; ++i) for(int t=1; t<4; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=N; i<N+4; ++i) for(int t=i-N+2; t<4; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; } 7
Sketch the hard fragments t i Solver completes this 4.4 Minutes voidsten1dSK (float[4,N] X) { assume ( N >= 3 ); for(int i= ; i<4 ; ++i) for(int t=; t< ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=; i<; ++i) for(int t=; t<; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=; i<4; ++i) for(int t=i; t<; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; } 8
The final sketch voidsten1dSK (float[4,N] X) implements sten1d { assume ( N >= 3 ); for(int i=linexpG(N, 4); i<linexpG(N, 4); ++i) for(int t=linexpG(N, 4, i); t<linexpG(N, 4, i); ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=linexpG(N, 4); i<linexpG(N, 4); ++i) for(int t=linexpG(N, 4, i); t<linexpG(N, 4, i); ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=linexpG(N, 4); i<linexpG(N, 4); ++i) for(int t=linexpG(N, 4, i); t<linexpG(N, 4, i); ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; } 9
The sketching experience + spec specification implementation (completed sketch) sketch 10
Case Study: Sketching MultiGrid • MultiGrid is an important PDE solver • applications in fluid dynamics and solid mechanics • Composed of three basic stencil kernels • Relax • Interpolate • Restrict • Sketched implementations for Relax & Interpolate (3D)
Relax: Red Black Gauss Seidel • 5 different implementations • for 2D and 3D • ideas from paper by Douglas et. al. • Users Insight: • Blocking • Do 1 pass instead of two • Hard part: • Black cells must be computed at an offset from red cells • Offset & Blocking => corner cases • Synthesis times under 5 ½ minutes • One sketched implementation was 3 times faster than the spec
Relax: Blocking for(int i=; i<; ++i) for(int j=; j<; ++j) for(int k=; k<; ++k) BODY(i,j,k, A); CORNER1(i, j,A); CORNER2(i, A); CORNER3(A); CORNER1(int i, int j, []A){ A[] = F(A, , , N+??); A[] = F(A, , , N+??); } CORNER2(int i, []A){ for(int k=; k<; ++k) A[] = F(A, , N+??, ); A[] = F(A, , N+??, ); A[] = F(A, , N+??, N+??); } CORNER3(int i, []A){ for(int j=; j<; ++j) for(int k=; k<; ++k) A[] = F(A, N+??, , ); }
Relax: Single Pass • Reds and Blacks in the same pass is harder • need offset between reds and blacks • Sketching still takes care of much of the complexity
Interpolate • 2 different implementations for 3D case • idea from NAS parallel bnchmrk • Implementations included • Tiling • pre-computing common subexpressions in temporary arrays • Loop rearangements for vectorization • 111 and 74 different holes respectively • All synthesis times under 2 ½ minutes • Sketched implementations up to 8 times faster than spec
For each original cell 7 new cells 26 additions 2 4 3 Interpolate 1
ak+ak+1 • ck+ck+1 • ak + bk • k + k+1 Interpolate: Optimized • 8 additions per original cell • precomputation is vectorized • The details are messy • They can be sketched ck+1 bk+1 k ak+1 j i ck bk ak
Interpolate for(int k=; k<; ++k) float ta = in[] + in[]; float tb = in[] + in[]; float tc = in[] + in[]; apb[k] = ta + tb; a[k] = ta; c[k] = tc; for(int k=; k<; ++k) A[] = in[]; A[] = in[] + in[]; for(int k=; k<; ++k) A[] = apb []; A[] = apb[] + apb[]; for(int k=; k<; ++k) A[] = a[]; A[] = a[] + a[]; for(int k=; k<; ++k) A[] = c[]; A[] = c[] + c[]; ck+1 bk+1 k ak+1 j i ck bk ak
Sketching by example (future work) Sketch the stencil: Stencil(p) := X[p] X[p+(-1,-1)] + X[p+(-1,1)] +
Sketching by example Sketch the iteration order: for(int i=†; i<; ++i) plast = (,); for(int t=; t<; ++t) p = (,); assert p-plast == (-1,1); Stencil(p); plast = p 2 1 † := linexpG(N,i,t)
Sketching + Autotuning • Autotuning • Empirical exploration of implementation space • Best hope for producing close-to-optimal solutions • Tool support could make it easier • Defining the implementation space is a laborious process • Search heuristics must be embedded in code generator • Sketching is great at defining implementation spaces
Sketching + Autotuning • Ex: Cache Blocking • Need to determine optimal block size
Defining the space • Write an underconstrained sketch • Autotuner searches over correct completions of sketch sten1D( real[N][T] input, real[N][T] output){ for(int i= ; i<4 ; ++i) for(int t=; t< ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i= ; i<4 ; i+=??) for(int t=; t< ; ++t) for(int k= ; k< ; ++k) X[t, i+k-t] = X[, ] + X[, ]; for(int i= ; i<4 ; ++i) for(int t=; t< ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; }
Conclusion • Sketching is synthesis made practical • Programmer provides high level implementation idea • low level details are synthesized • Sketching can support new optimization ideas • not limited to the transformations hard-coded into compiler • optimization ideas expressed in the same language • Sketching is orthogonal to Autotuning • Autotuning could be made easier with sketching