1 / 27

Synthesis with Sketching

Synthesis with Sketching. Armando Solar-Lezama UC Berkeley

halll
Download Presentation

Synthesis with Sketching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synthesis with Sketching Armando Solar-Lezama UC Berkeley Gilad Arnold, Ras Bodik, Bob Brayton, Chris Jones, Alan Mishchenko, Armando Solar-Lezama, Koushik Sen, Sanjit Seshia, Liviu Tancau UC Berkeley Vijay Saraswat, Satish Chandra, Eran Yahav IBM Mooly Sagiv Tel Aviv University

  2. Synthesis The promise: automate program development 2

  3. Insight Big picture Strategy Exhaustive exploration Details Tactics Challenge Establish a synergy between synthesizer and user

  4. Key Observation Insight and Mechanics are both reflected in the source code The Sketch solution: • Write only the code corresponding to insight • Let the synthesizer derive the mechanics

  5. A stencil: spec t i voidsten1d (float[4,N] X) { for(int t=1; t<4; ++t) for(int i=1; i<N-1; ++i) X[t, i] = X[t-1, i-1] + X[t-1, i+1]; } 5

  6. Fast implementation t i voidsten1dSK (float[4,N] X) { assume ( N >= 3 ) for(int i= 0; i<4 ; ++i) for(int t=1; t<i ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=4; i<N; ++i) for(int t=1; t<4; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=N; i<N+4; ++i) for(int t=i-N+2; t<4; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; } 6

  7. What are the hard fragments? t i voidsten1dSK (float[4,N] X) { assume ( N >= 3 ); for(int i= 0; i<4 ; ++i) for(int t=1; t<i ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=4; i<N; ++i) for(int t=1; t<4; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=N; i<N+4; ++i) for(int t=i-N+2; t<4; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; } 7

  8. Sketch the hard fragments t i Solver completes this 4.4 Minutes voidsten1dSK (float[4,N] X) { assume ( N >= 3 ); for(int i= ; i<4 ; ++i) for(int t=; t< ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=; i<; ++i) for(int t=; t<; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=; i<4; ++i) for(int t=i; t<; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; } 8

  9. The final sketch voidsten1dSK (float[4,N] X) implements sten1d { assume ( N >= 3 ); for(int i=linexpG(N, 4); i<linexpG(N, 4); ++i) for(int t=linexpG(N, 4, i); t<linexpG(N, 4, i); ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=linexpG(N, 4); i<linexpG(N, 4); ++i) for(int t=linexpG(N, 4, i); t<linexpG(N, 4, i); ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i=linexpG(N, 4); i<linexpG(N, 4); ++i) for(int t=linexpG(N, 4, i); t<linexpG(N, 4, i); ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; } 9

  10. The sketching experience + spec specification implementation (completed sketch) sketch 10

  11. Demo

  12. Case Study: Sketching MultiGrid • MultiGrid is an important PDE solver • applications in fluid dynamics and solid mechanics • Composed of three basic stencil kernels • Relax • Interpolate • Restrict • Sketched implementations for Relax & Interpolate (3D)

  13. Relax: Red Black Gauss Seidel • 5 different implementations • for 2D and 3D • ideas from paper by Douglas et. al. • Users Insight: • Blocking • Do 1 pass instead of two • Hard part: • Black cells must be computed at an offset from red cells • Offset & Blocking => corner cases • Synthesis times under 5 ½ minutes • One sketched implementation was 3 times faster than the spec

  14. Relax: Blocking for(int i=; i<; ++i) for(int j=; j<; ++j) for(int k=; k<; ++k) BODY(i,j,k, A); CORNER1(i, j,A); CORNER2(i, A); CORNER3(A); CORNER1(int i, int j, []A){ A[] = F(A, , , N+??); A[] = F(A, , , N+??); } CORNER2(int i, []A){ for(int k=; k<; ++k) A[] = F(A, , N+??, ); A[] = F(A, , N+??, ); A[] = F(A, , N+??, N+??); } CORNER3(int i, []A){ for(int j=; j<; ++j) for(int k=; k<; ++k) A[] = F(A, N+??, , ); }

  15. Relax: Single Pass • Reds and Blacks in the same pass is harder • need offset between reds and blacks • Sketching still takes care of much of the complexity

  16. Interpolate • 2 different implementations for 3D case • idea from NAS parallel bnchmrk • Implementations included • Tiling • pre-computing common subexpressions in temporary arrays • Loop rearangements for vectorization • 111 and 74 different holes respectively • All synthesis times under 2 ½ minutes • Sketched implementations up to 8 times faster than spec

  17. For each original cell 7 new cells 26 additions 2 4 3 Interpolate 1

  18. ak+ak+1 • ck+ck+1 • ak + bk • k + k+1 Interpolate: Optimized • 8 additions per original cell • precomputation is vectorized • The details are messy • They can be sketched ck+1 bk+1 k ak+1 j i ck bk ak

  19. Interpolate for(int k=; k<; ++k) float ta = in[] + in[]; float tb = in[] + in[]; float tc = in[] + in[]; apb[k] = ta + tb; a[k] = ta; c[k] = tc; for(int k=; k<; ++k) A[] = in[]; A[] = in[] + in[]; for(int k=; k<; ++k) A[] = apb []; A[] = apb[] + apb[]; for(int k=; k<; ++k) A[] = a[]; A[] = a[] + a[]; for(int k=; k<; ++k) A[] = c[]; A[] = c[] + c[]; ck+1 bk+1 k ak+1 j i ck bk ak

  20. Future Directions

  21. Sketching by example (future work) Sketch the stencil: Stencil(p) := X[p]  X[p+(-1,-1)] + X[p+(-1,1)] +

  22. Sketching by example Sketch the iteration order: for(int i=†; i<; ++i) plast = (,); for(int t=; t<; ++t) p = (,); assert p-plast == (-1,1); Stencil(p); plast = p 2 1 † := linexpG(N,i,t)

  23. Sketching + Autotuning • Autotuning • Empirical exploration of implementation space • Best hope for producing close-to-optimal solutions • Tool support could make it easier • Defining the implementation space is a laborious process • Search heuristics must be embedded in code generator • Sketching is great at defining implementation spaces

  24. Sketching + Autotuning • Ex: Cache Blocking • Need to determine optimal block size

  25. Defining the space • Write an underconstrained sketch • Autotuner searches over correct completions of sketch sten1D( real[N][T] input, real[N][T] output){ for(int i= ; i<4 ; ++i) for(int t=; t< ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; for(int i= ; i<4 ; i+=??) for(int t=; t< ; ++t) for(int k= ; k< ; ++k) X[t, i+k-t] = X[, ] + X[, ]; for(int i= ; i<4 ; ++i) for(int t=; t< ; ++t) X[t, i-t] = X[t-1, i-1-t] + X[t-1, i+1-t]; }

  26. Conclusion • Sketching is synthesis made practical • Programmer provides high level implementation idea • low level details are synthesized • Sketching can support new optimization ideas • not limited to the transformations hard-coded into compiler • optimization ideas expressed in the same language • Sketching is orthogonal to Autotuning • Autotuning could be made easier with sketching

  27. Questions?

More Related