1 / 24

Domain-specific interpreters (a nested talk)

Domain-specific interpreters (a nested talk). Paul Kelly (Imperial College London) Joint work with Olav Beckmann, Karen Osmond, Tony Field and others. Dagstuhl, January 2006. Domain-specific optimisation. Libraries extend general-purpose languages

masato
Download Presentation

Domain-specific interpreters (a nested talk)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Domain-specific interpreters(a nested talk) Paul Kelly (Imperial College London) Joint work with Olav Beckmann, Karen Osmond, Tony Field and others Dagstuhl, January 2006

  2. Domain-specific optimisation • Libraries extend general-purpose languages • Good libraries promote problem-focused code • “Active libraries” apply library-specific optimisations to client code • Client calling context may enable optimisation • fusion, • redundancy elimination, • incremental-isation, etc C a = new C(…); C b = new C(…); … c = a.f(…); … print( b.g(c) ); constructor C(…); constructor C(…); f(…) {…} g(…) {…} Client Library

  3. Active library technologies • How to deliver “active libraries”? • Domain-specific compiler? • Source-to-source transformation? • Plug-in – based compiler architecture? • Plug-in – based virtual machine? • “Domain-specific optimisation components” • Aspect weaver? • This talk is about an appealingly low-tech solution, which we glorify with a big name – the “Domain-Specific Interpreter”

  4. Domain-specific interpreter • DSI is interposed between client and library C a = new C(…); C b = new C(…); … c = a.f(…); … print( b.g(c) ); constructor C(…); constructor C(…); f(…) {…} g(…) {…} Delay Execution, build “recipe” Plan optimised execution, execute Client Library DSI • Inject proxy between application and library • Use proxy to capture, delay and optimise the calls

  5. Domain-specific interpreter • DSI is a design pattern • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it (in your favoured language)? • Show me an example! • Let’s do the example first…

  6. MayaVi • Tool for visualising fluid flows • GUI supports interactive construction of visualisation pipelines • Eg Fluid flow past a heated sphere: temperature isosurface with temperature-shaded streamtubes

  7. MayaVi • Tool for visualising fluid flows • GUI supports interactive construction of visualisation pipelines • Eg Fluid flow past a heated sphere: temperature isosurface with temperature-shaded streamtubes

  8. MayaVi • Tool for visualising fluid flows • GUI supports interactive construction of visualisation pipelines • Eg Fluid flow past a heated sphere: temperature isosurface with temperature-shaded streamtubes • I’m going to show you how we dramatically improved MayaVi interactivity • By parallel execution on SMP • By parallel execution on linux cluster • By caching pre-calculated results • Without changing a single line of MayaVi or VTK code • Without writing a compiler

  9. MayaVi • Tool for visualising fluid flows • GUI supports interactive construction of visualisation pipelines • Eg Fluid flow past a heated sphere: temperature isosurface with temperature-shaded streamtubes • I’m going to show you how we dramatically improved MayaVi interactivity • By parallel execution on SMP • By parallel execution on linux cluster • By caching pre-calculated results • Without changing a single line of MayaVi or VTK code* • Without writing a compiler * Actually we did change a few lines in VTK to fix a problem with Python’s Global Interpreter Lock

  10. MayaVi: Working on partitioned data • Our ocean simulations are generated in parallel • Input data consists of a set of partitions (and an XML index) • Normally, VTK fuses these partitions into one mesh as they are read

  11. MayaVi: Working on partitioned data • Our ocean simulations are generated in parallel • Input data consists of a set of partitions (and an XML index) • Normally, VTK fuses these partitions into one mesh as they are read • Some – many – analyses can operate partition-by-partition

  12. MayaVi: what the DSI has to do • Capture all delayable calls to methods from a DSL through a proxy layer • A force point is a call which requires an immediate result – in this case to render on screen • A recipe is the set of calls between consecutive force points (in parallel)

  13. Implementing a generic DSI proxy in Python import vtkpython_real from vtkdsi import proxyObject for className in dir(vtkpython_real): exec “class “ + className + “(proxyObject):pass” • Self-generating proxy module • Proxy implementation class proxyObject: def __getattr__ (self, callName): return lambda callArgs: self.proxyCall(callName, callArgs) def proxyCall(self, callName, callArgs): # if forcepoint: optimise and apply recipe # else: add call to the current recipe • Actually, the real implementation generates dummies for all the methods and members as well as the classes • So when MayaVi reflects on the module to generate the GUI configuration forms it finds the right stuff

  14. How well does it work? • Benchmark: • Plot isosurfaces for seven pressure values in flow past heated sphere • Each isosurface is several hundred MB • Hardware: • For SMP: Athlon 1600+, dual SMP, 256 KB L2, 1 GB RAM, Linux 2.4 • For distributed-memory: Cluster of 4 Pentium 4 2.8 GHz, 512 KB L2, 1 GB RAM, Linux 2.4

  15. Tiling optimisation yields substantial speedup • Modest further speedup from two-way shared-memory parallel • Parallel execution on a four-processor Linux cluster also offers substantial speedup Isosurface benchmark: cluster of four 2GHz Pentium 4 PCs

  16. Further MayaVi DSI optimisations • Caching: • check whether results of this recipe (or part thereof) are available in cache • Multiple frames per second… • Region of Interest (RoI): • Load from disk only those partitions which intersect a cuboid specified by the user • Level of Detail (LoD): • Each dataset is stored in full-resolution form but also in a hierarchy of coarsened, decimated versions • Put together… “Google Earth” for global ocean flow

  17. Further MayaVi DSI optimisations • Caching: • check whether results of this recipe (or part thereof) are available in cache • Multiple frames per second… • Region of Interest (RoI): • Load from disk only those partitions which intersect a cuboid specified by the user • Level of Detail (LoD): • Each dataset is stored in full-resolution form but also in a hierarchy of coarsened, decimated versions • Put together… “Google Earth” for global ocean flow • Large space of possible execution plans for each visualisation task - choose • Appropriate parallelisation • recalculate or retrieve from (remote, persistent, peer?) cache • Which intermediate results to save to cache • Partition size • Level of detail (eg to satisfy response-time budget) • Whether to decimate surfaces to fit in graphics RAM • Whether to construct (and cache) index for multiple isosurfaces

  18. Back to DSI… • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it? • Show me an example! • When: • You can’t analyse the client code • The client code is too complex to analyse statically • The client composes library code dynamically • The overheads are small compared to library functions’ execution time

  19. Back to DSI… • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it? • Show me an example! • When: • Execution of library code can be delayed • All dependencies between client and library code are explicit in library API • Library data structures are opaque

  20. Back to DSI… • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it? • Show me an example! • Interpose proxy: • Built by hand • Using generic proxy mechanism based on reflection – as shown in Python • Using IDL-based parameter marshalling • Using aspect weaver (but…)

  21. Back to DSI… • Standard questions: • When is DSI a good idea? • When is it applicable? • How do you implement it? • Show me an example! • We have used the DSI trick several times • So have lots of other people… • MayaVi/Python/VTK • Message fusion and scheduling in parallel programming • Loop fusion in a matrix/vector library • Aggregation of Java RMI (correctness issues are tricky)

  22. What makes DSI hard to implement? • Non-opaque return values • Eg vector type is opaque, but dot-product returns a non-opaque scalar • Exceptions • Delayed execution shifts the point where errors are discovered • Unnecessary force-points • Eg property getter methods • Hidden dependencies • Eg we can aggregate remote method calls provided none of them results in a call back that can affect the caller JVM • Antidependencies • Client overwrites operand of delayed call (Next to Last slide)

  23. Conclusions/discussion • DSI is not new • But just keeps popping up, solves tricky problems • DSI programs are program generators • Type safety of the recipe derives from type safety of the client (so DSI interpreter could be tagless) • Safety of optimising transformations is another matter… • DSIs can be JITs • Eg our C++ matrix/vector library uses a multistage programming library to generate C loops at runtime (and fuse them) • There is a useful catalogue of techniques to enhance DSI applicability, overheads etc Last slide

  24. Related stuff… • Lazy evaluation – with reflection • Template metaprogramming – encode recipe in type • Proxy interposition trick is common in dynamically-typed languages: • Redefining the lookup function in Common Lisp • The “doesNotUnderstand: hack” in Smalltalk • The idea of converting a call to a message… • Message-Oriented Programming: The Case for First Class Messages (Dave Thomas, JOT 2004) • Tomasulo-style renaming • to prevent antidependences from forcing execution • Compare with explicit recipe construction • workflow systems, command objects, LINQ

More Related