1 / 38

Runtime Optimization with Specialization

Runtime Optimization with Specialization. Johnathon Jamison CS265 Susan Graham 4-30-2003. What is Runtime Code Generation (RTCG)?. Dynamic addition of code to the instruction stream Restricted to instructions executed directly by hardware. Problems with RTCG. Reentrant code

ernie
Download Presentation

Runtime Optimization with Specialization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Runtime Optimization with Specialization Johnathon Jamison CS265 Susan Graham 4-30-2003

  2. What is Runtime Code Generation (RTCG)? • Dynamic addition of code to the instruction stream • Restricted to instructions executed directly by hardware

  3. Problems with RTCG • Reentrant code • Portability (HL languages vs. assembly) • Data vs. Code issues • Caches and memory • Standard compilation schema • Maintainability and understandability

  4. Benefits of RTCG • Adaptation to the architecture (cache sizes, various latencies, etc.) • JIT compilation • No profiling (actual data is available) • Literals enable optimizations unknown or impossible at runtime • Potentially more compact (for caches)

  5. Dynamic Compilation Trade-offs • Execution time is linear in run count • Choice between lower startup cost, lower incremental cost Execution Time Unoptimized Static Code Optimized Static Code Low Optimized Dynamic Code High Optimized Dynamic Code Input

  6. Observation • Programmers write the common case • blit routines • image display • Applications have repetitious data • simulators • Regexp matching • Optimizations • sparse matrices

  7. One Tack: Specialization • Take a piece of code and replace variables with constants • Enables various optimizations • Strength reduction • Constant propagation • etc.... • Generate explicitly or implicitly • Possibly reuse

  8. Example Of Specialization int dot_product(int size, int u[], int v[]) { int res = 0; for (i = 0; i < size; i++) { res = res + u[i] * v[i]; } return res; } • Suppose size == 5, u == {14,0,38,12,1}

  9. Example Of Specialization int dot_product_1(int v[]) { int res = 0; for (i = 0; i < 5 ; i++) { res = res + {14,0,38,12,1}[i] * v[i]; } return res; } • Substitute in the values

  10. Example Of Specialization int dot_product_1(int v[]) { int res = 0; res = res + 14 * v[0]; res = res + 0 * v[1]; res = res + 38 * v[2]; res = res + 12 * v[3]; res = res + 1 * v[4]; return res; } • Unroll the loop

  11. Example Of Specialization int dot_product_1(int v[]) { int res; res = 14 * v[0]; res = res + 38 * v[2]; res = res + 12 * v[3]; res = res + v[4]; return res; } • Eliminate unneeded code

  12. DyC • make_static annotation indicates which variables to specialize with respect to • @ annotation indicates static loads (a reload is not needed) int dot_product(int size, int u[], int v[]) { make_static(size, u); int res = 0; for (i = 0; i < size; i++) { res = res + u@[i] * v[i]; } return res; }

  13. DyC Specializer • Each region has a runtime specializer • Setup computations are run • The values are plugged into holes in code templates • The resultant code is optimized • The result is translated to machine code and run

  14. DyC Optimizations • Polyvariant specialization • Internal dynamic-to-static promotions • Unchecked dispatching • Complete loop unrolling • Polyvariant division and conditional specialization • Static loads and calls • Strength reduction, zero and copy propagation, and dead-assignment elimination (precomputed!)

  15. DyC Annotations • Runtime constants and constant functions • Specialization/division should be mono-/poly-variant • Disable/enable internal promotions • Compile eagerly/lazily downstream of branches • Code caching style at merges/promotions • Interprocedural specialization

  16. Manual Annotation • Profile to find potential gains • Concentrate on areas with high execution times • If unobvious, log values of parameters to find runtime constants • Trial and error loop unrolling

  17. Applications

  18. Optimizations used

  19. Break Even Points

  20. Performance

  21. Speedup without a given feature

  22. Calpa • A system that automatically generates DyC annotations • Profiles code, collecting statistics • Analyses results • Annotates code • Basically, automates what previously was done manually

  23. Calpa, Step 1 • Instrumentation tool instruments the original binary • Executed on representative input • Generates summarized value and frequency data • Fed into next step

  24. The Instrumenter • Three types of information collected • Basic block execution frequencies • Variable definitions • Variable uses • Points-to info for invalidation of constants necessary for safety • uses stored as value/occurrence pairs, with procedure invocation noted, for groups of related values in a procedure

  25. Profiling Data

  26. Profiling • Seconds to hours • Naive profiling was sufficient for their purposes, and so left unoptimized • Another paper describes more efficient profile gathering

  27. Calpa, Step 2 • Annotation tool searches possible space of annotations • Selects annotations and creates annotated program • Passed to DyC, which compiles the program • Calpa == policy, DyC == mechanism

  28. Canadate Static Variable (CSV) Sets • A CSV set is the set of CSVs that make an instruction static • Propagate if exactly one definition exists

  29. CSV Sets Example i = 0 {} L1: if i >= size goto L2 {i, size} uelem = u[i] {i, u[]} velem = v[i] {i, v[]} t = uelem * velem {i, u[], v[]} sum =sum + t {i, sum, u[], v[]} i = i + 1 {i} goto L1 {} L2:

  30. Candidate Division (CD) Sets • A CD is a set of CSVs • Set of static instructions in a CD are those instructions whose CSV sets are subsets of the CD • The CD Set is all CDs produced from some combination of the CSV sets • No need to consider other CDs (21 out of 32)

  31. CD Sets Example {} {} {i} {i} {i, size} {i, size} {i, u[]} {i, u[]} {i, v[]} {i, v[]} {i, u[], v[]} {i, u[], v[]} {i, sum, u[], v[]} {i, sum, u[], v[]} {i, size, u[]} {i, size, v[]} {i, size, u[], v[]} {i, size, sum, u[], v[]}

  32. Search of CD Space • The CDs are enumerated, starting with the least variable variables • As additional CDs are enumerated, the "best" one is kept • The search terminates if • All CDs are enumerated • a time quota expires • the improvement over the "best" so far drops below a threshold

  33. Cost Model • Specialization cost • Basic block size * # of vals • Loop size * # of values of the induction variable (scale for multiway loops) • Total instruction * instruction generation cost • Cache cost • Lookup cost • Hash key construction (# of vars * cost per var) • Except if unchecked policy is used • Invalidation cost • Sum of execution frequency * invalidate cost for all invalidation points

  34. Benefit Model • Runs a simplified DyC analysis • Assumes whole procedure specialization (overestimating costs) • Count number of saved cycles assuming the given CD • Only looks at the critical path (a simplifying assumption) • A win if saved cycles > cycle cost

  35. Calpa is safe • Static, unchecked, and eager annotations are selected when profile information hints at this • However, these are unsafe • Calpa does invalidations at all points when it could upset safety • Also makes pessimistic assumptions about external routines • It is always safe to avoid these annotations

  36. Testing • Tested on previously annotated programs • The annotation process was much quicker • Found all manual annotations • Plus more annotations

  37. Annotations found • All the manual ones, plus two more • Search key in search program • Vector v in dotproduct • The unvarying nature of these variables was an artifact of atypical use • But getting good profiling input is someone else's research

  38. Related Work • Value Profiling • Work that builds an efficient system to profile programs with the aim of using the collected information to drive specialization. They do not do value sequence information collection, which Calpa needs. They also do binary instrumentation. • Fabius • This system takes functions that are curried and generate code for partially evaluated functions. Thus, the idiom of currying is leveraged to optimize code at runtime. • Tick C • `C extends C with a few additional constructs that allow explicit manual runtime code compilation. You specify exactly what code that you wish to have compiled in C-like fragments. In the spirit of runtime trade-offs, code generation can be in one of two forms, one quick to create, and one more efficient. • Tempo • Tempo can either be a source to source compilation, or a source to runtime code generator. It is much more limited in scope than DyC/Calpa. However, it does have an automatic side-effect/alias analysis.

More Related