240 likes | 368 Views
Domain-specific interpreters (a nested talk). Paul Kelly (Imperial College London) Joint work with Olav Beckmann, Karen Osmond, Tony Field and others. Dagstuhl, January 2006. Domain-specific optimisation. Libraries extend general-purpose languages
E N D
Domain-specific interpreters(a nested talk) Paul Kelly (Imperial College London) Joint work with Olav Beckmann, Karen Osmond, Tony Field and others Dagstuhl, January 2006
Domain-specific optimisation • Libraries extend general-purpose languages • Good libraries promote problem-focused code • “Active libraries” apply library-specific optimisations to client code • Client calling context may enable optimisation • fusion, • redundancy elimination, • incremental-isation, etc C a = new C(…); C b = new C(…); … c = a.f(…); … print( b.g(c) ); constructor C(…); constructor C(…); f(…) {…} g(…) {…} Client Library
Active library technologies • How to deliver “active libraries”? • Domain-specific compiler? • Source-to-source transformation? • Plug-in – based compiler architecture? • Plug-in – based virtual machine? • “Domain-specific optimisation components” • Aspect weaver? • This talk is about an appealingly low-tech solution, which we glorify with a big name – the “Domain-Specific Interpreter”
Domain-specific interpreter • DSI is interposed between client and library C a = new C(…); C b = new C(…); … c = a.f(…); … print( b.g(c) ); constructor C(…); constructor C(…); f(…) {…} g(…) {…} Delay Execution, build “recipe” Plan optimised execution, execute Client Library DSI • Inject proxy between application and library • Use proxy to capture, delay and optimise the calls
Domain-specific interpreter • DSI is a design pattern • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it (in your favoured language)? • Show me an example! • Let’s do the example first…
MayaVi • Tool for visualising fluid flows • GUI supports interactive construction of visualisation pipelines • Eg Fluid flow past a heated sphere: temperature isosurface with temperature-shaded streamtubes
MayaVi • Tool for visualising fluid flows • GUI supports interactive construction of visualisation pipelines • Eg Fluid flow past a heated sphere: temperature isosurface with temperature-shaded streamtubes
MayaVi • Tool for visualising fluid flows • GUI supports interactive construction of visualisation pipelines • Eg Fluid flow past a heated sphere: temperature isosurface with temperature-shaded streamtubes • I’m going to show you how we dramatically improved MayaVi interactivity • By parallel execution on SMP • By parallel execution on linux cluster • By caching pre-calculated results • Without changing a single line of MayaVi or VTK code • Without writing a compiler
MayaVi • Tool for visualising fluid flows • GUI supports interactive construction of visualisation pipelines • Eg Fluid flow past a heated sphere: temperature isosurface with temperature-shaded streamtubes • I’m going to show you how we dramatically improved MayaVi interactivity • By parallel execution on SMP • By parallel execution on linux cluster • By caching pre-calculated results • Without changing a single line of MayaVi or VTK code* • Without writing a compiler * Actually we did change a few lines in VTK to fix a problem with Python’s Global Interpreter Lock
MayaVi: Working on partitioned data • Our ocean simulations are generated in parallel • Input data consists of a set of partitions (and an XML index) • Normally, VTK fuses these partitions into one mesh as they are read
MayaVi: Working on partitioned data • Our ocean simulations are generated in parallel • Input data consists of a set of partitions (and an XML index) • Normally, VTK fuses these partitions into one mesh as they are read • Some – many – analyses can operate partition-by-partition
MayaVi: what the DSI has to do • Capture all delayable calls to methods from a DSL through a proxy layer • A force point is a call which requires an immediate result – in this case to render on screen • A recipe is the set of calls between consecutive force points (in parallel)
Implementing a generic DSI proxy in Python import vtkpython_real from vtkdsi import proxyObject for className in dir(vtkpython_real): exec “class “ + className + “(proxyObject):pass” • Self-generating proxy module • Proxy implementation class proxyObject: def __getattr__ (self, callName): return lambda callArgs: self.proxyCall(callName, callArgs) def proxyCall(self, callName, callArgs): # if forcepoint: optimise and apply recipe # else: add call to the current recipe • Actually, the real implementation generates dummies for all the methods and members as well as the classes • So when MayaVi reflects on the module to generate the GUI configuration forms it finds the right stuff
How well does it work? • Benchmark: • Plot isosurfaces for seven pressure values in flow past heated sphere • Each isosurface is several hundred MB • Hardware: • For SMP: Athlon 1600+, dual SMP, 256 KB L2, 1 GB RAM, Linux 2.4 • For distributed-memory: Cluster of 4 Pentium 4 2.8 GHz, 512 KB L2, 1 GB RAM, Linux 2.4
Tiling optimisation yields substantial speedup • Modest further speedup from two-way shared-memory parallel • Parallel execution on a four-processor Linux cluster also offers substantial speedup Isosurface benchmark: cluster of four 2GHz Pentium 4 PCs
Further MayaVi DSI optimisations • Caching: • check whether results of this recipe (or part thereof) are available in cache • Multiple frames per second… • Region of Interest (RoI): • Load from disk only those partitions which intersect a cuboid specified by the user • Level of Detail (LoD): • Each dataset is stored in full-resolution form but also in a hierarchy of coarsened, decimated versions • Put together… “Google Earth” for global ocean flow
Further MayaVi DSI optimisations • Caching: • check whether results of this recipe (or part thereof) are available in cache • Multiple frames per second… • Region of Interest (RoI): • Load from disk only those partitions which intersect a cuboid specified by the user • Level of Detail (LoD): • Each dataset is stored in full-resolution form but also in a hierarchy of coarsened, decimated versions • Put together… “Google Earth” for global ocean flow • Large space of possible execution plans for each visualisation task - choose • Appropriate parallelisation • recalculate or retrieve from (remote, persistent, peer?) cache • Which intermediate results to save to cache • Partition size • Level of detail (eg to satisfy response-time budget) • Whether to decimate surfaces to fit in graphics RAM • Whether to construct (and cache) index for multiple isosurfaces
Back to DSI… • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it? • Show me an example! • When: • You can’t analyse the client code • The client code is too complex to analyse statically • The client composes library code dynamically • The overheads are small compared to library functions’ execution time
Back to DSI… • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it? • Show me an example! • When: • Execution of library code can be delayed • All dependencies between client and library code are explicit in library API • Library data structures are opaque
Back to DSI… • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it? • Show me an example! • Interpose proxy: • Built by hand • Using generic proxy mechanism based on reflection – as shown in Python • Using IDL-based parameter marshalling • Using aspect weaver (but…)
Back to DSI… • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it? • Show me an example! • We have used the DSI trick several times • So have lots of other people… • MayaVi/Python/VTK • Message fusion and scheduling in parallel programming • Loop fusion in a matrix/vector library • Aggregation of Java RMI (correctness issues are tricky)
What makes DSI hard to implement? • Non-opaque return values • Eg vector type is opaque, but dot-product returns a non-opaque scalar • Exceptions • Delayed execution shifts the point where errors are discovered • Unnecessary force-points • Eg property getter methods • Hidden dependencies • Eg we can aggregate remote method calls provided none of them results in a call back that can affect the caller JVM • Antidependencies • Client overwrites operand of delayed call (Next to Last slide)
Conclusions/discussion • DSI is not new • But just keeps popping up, solves tricky problems • DSI programs are program generators • Type safety of the recipe derives from type safety of the client (so DSI interpreter could be tagless) • Safety of optimising transformations is another matter… • DSIs can be JITs • Eg our C++ matrix/vector library uses a multistage programming library to generate C loops at runtime (and fuse them) • There is a useful catalogue of techniques to enhance DSI applicability, overheads etc Last slide
Related stuff… • Lazy evaluation – with reflection • Template metaprogramming – encode recipe in type • Proxy interposition trick is common in dynamically-typed languages: • Redefining the lookup function in Common Lisp • The “doesNotUnderstand: hack” in Smalltalk • The idea of converting a call to a message… • Message-Oriented Programming: The Case for First Class Messages (Dave Thomas, JOT 2004) • Tomasulo-style renaming • to prevent antidependences from forcing execution • Compare with explicit recipe construction • workflow systems, command objects, LINQ