1 / 61

The PSciCo Project: P arallel Sci entific Co mputing in ML

The PSciCo Project: P arallel Sci entific Co mputing in ML. Robert Harper Carnegie Mellon University August, 2000. Acknowledgements. This talk represents collaboration with Guy Blelloch (CS) , Gary Miller (CS) , and Noel Walkington (Math) .

dgonzales
Download Presentation

The PSciCo Project: P arallel Sci entific Co mputing in ML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The PSciCo Project: Parallel Scientific Computing in ML Robert Harper Carnegie Mellon University August, 2000

  2. Acknowledgements • This talk represents collaboration with Guy Blelloch (CS), Gary Miller (CS), and Noel Walkington (Math). • Much of the work is done by our students: Umut Acar, Hal Burch, Franklin Chen, Perry Cheng, David Garmire, Aleks Nanevski, Leaf Petersen, Seth Porter, and Chris Stone. Robert Harper - PSciCo Project

  3. Project Goal Use high-level languages for scientific computing to achieve ... • Smooth integration of production and prototyping. • Richer data structures and more complex algorithms. • Free exchange of re-usable components. • Expressiveness with acceptable efficiency. Robert Harper - PSciCo Project

  4. Project Thesis We may achieve these goals by exploiting • Functional programming. • Persistent data structures. • Higher-order functions, especially staging (currying) and parameterization. • Implicit coarse-grained parallelism. • Parallel operations on aggregates. • Modular programming. • Interfaces, parameterization, hierarchy. Robert Harper - PSciCo Project

  5. How We Can Win • Use methods that are “unthinkable” in current languages. • persistence, parallelism, higher-order functions, modularity • Change the terms of the debate. • Efficiency matters, but it’s not everything. • Expressiveness, concision, robustness, flexibility matter too. Robert Harper - PSciCo Project

  6. How We Can Win • Eliminate the distinction between • Prototyping languages, for experimenting with new ideas. • Encourage “quick and dirty” hacks. • Poor software engineering properties. • Production languages, for building systems “for real”. • Sacrifice ease of use and maintainability for the sake of efficiency. Robert Harper - PSciCo Project

  7. The State of the Art • Prototyping languages • eg, Mathematica, MatLab • Convenient, especially interactively • Support numeric and symbolic computation • High-level, language-oriented • Packages / toolkits for various domains • Easy to build an application Robert Harper - PSciCo Project

  8. The State of the Art • Production languages • eg, Fortran, C, C++ • Batch-oriented, not interactive • Low-level, machine-oriented • Poor support for symbolic computation • Large collection of libraries, packages • More effort to build a system Robert Harper - PSciCo Project

  9. Starting Point: ML + NESL • ML • Excellent support for complex data structures and symbolic computing • Excellent support for modularity • Poor support for numeric computation • Poor support for parallelism Robert Harper - PSciCo Project

  10. Starting Point: ML + NESL • NESL • First-order functional language. • Excellent support for implicit parallelism. • “scan vector” model of data parallelism • No support for complex data stuctures or symbolic computing. • No support for modularity. Robert Harper - PSciCo Project

  11. Current Work • Shared libraries. • Adaptive precision arithmetic. • General purpose geometry and physics. • Triangulated surfaces (simplicial complexes). • Applications. • Convex hull. • Delaunay triangulation. • n-Body problem. • Terrain modelling. • Finite element method. Robert Harper - PSciCo Project

  12. Current Work • Language technology. • TILT compiler for Standard ML. • typed intermediate languages for efficient data representation • Parallel run-time system with space- and time-efficient thread scheduling. • exploited by data parallel sequence primitives. • Parallel, concurrent, real-time copying GC. • Multiple mutators and collectors on an SMP Robert Harper - PSciCo Project

  13. Shared Libraries • Geometry primitives • Points, vectors, figures. • Line-side test, in-circle test. • Arithmetic. • adaptive precision floating point • interval arithmetic • Simplicial complexes. • Persistent representation of surfaces. Robert Harper - PSciCo Project

  14. Representing Geometry • Single dimension-independent interface: signature GEOMETRY = ... • Several dimension-dependent implementations: structure Geo2D :> GEOMETRY = ... structure Geo3D :> GEOMETRY = ... • Different types for different dimensions! • avoids run-time checking for conformance Robert Harper - PSciCo Project

  15. Representing Geometry signature GEOMETRY = sigstructure Vec : VECstructure Point : POINTstructure HalfSpace : HALF_SPACEstructure Box : BOXsharing Point.Vec = Vecand HalfSpace.Point = Point end Robert Harper - PSciCo Project

  16. Representing Geometry signature VEC = sigstructure Scalar : NUMBERtype tval + : t * t -> tval scale : t * Scalar.t -> tval dot : t * t -> Scalar.tval norm : t -> Scalar.t end Robert Harper - PSciCo Project

  17. Representing Geometry signature POINT = sigstructure Vec : VECstructure Number : NUMBERtype tval origin : tval translate : t * Vec.t -> tval - : t * t -> Vec.t end Robert Harper - PSciCo Project

  18. Representing Geometry signature HALF_SPACE = sigexception Degeneratestructure Pt : POINTstructure Vec : VECtype tval from_pts : Pt.t Seq.t -> tval side_test : t -> Pt.t -> Side.t (* 3-valued *) end Robert Harper - PSciCo Project

  19. An Object-Oriented Approach • CGAL uses OOP techniques. • No common interface for any dimension. • One implementation for all dimensions. Point translate (Point p, Vector v) • Requires run-time conformance checks. • Uses templates for numeric precision (float, double, etc) Robert Harper - PSciCo Project

  20. ML vs OO • In ML conformance is ensured by sharing specifications. • Vectors and points must agree on common types. • Run-time checks are avoided. • ML supports “coherent integration” of abstractions better than do C++ or Java. • Geometry library is a good example of this. Robert Harper - PSciCo Project

  21. Arithmetic • Key questions in geometric algorithms: • Side test: is a point above or below a given line or plane? • Circle test: is a point within a circle or sphere? • These functions convert a floating point number to a boolean. • Check sign of a determinant. Robert Harper - PSciCo Project

  22. Arithmetic • It is essential that these questions be answered consistently and accurately. • Otherwise higher-level results are nonsense (eg, “convex hull” not convex). • It is essential that these be implemented efficiently. • Lie at the heart of just about any algorithm in CG. Robert Harper - PSciCo Project

  23. Arithmetic • Adaptive precision via error analysis (Priest, Shewchuk). • Calculate using single precision. • Calculate “fuzz factor” by forward error analysis. • If reliable, answer, otherwise re-do in double precision, then arbitrary precision. • Key weakness: must derive and implement the error calculation; not a general arithmetic package. Robert Harper - PSciCo Project

  24. Arithmetic • “Lazy” interval arithmetic. • Use a general interval arithmetic package for all arithmetic. • IA determines reliability of result (eg, for positivity test, interval cannot span origin). • Carry a symbolic representation of the “history” of the computation of the point. • Re-do using full precision if unreliable. Robert Harper - PSciCo Project

  25. Arithmetic • Intervals work well for polynomials. • Enough to do CG tests, assuming “virgin data”. • Polynomials are not enough, in general! • eg, projections onto a sphere require square roots • How much can we feasibly support? • algebraic numbers? • transcendentals? Robert Harper - PSciCo Project

  26. Real Numbers • Why not represent all (computable) real numbers? • eg, Edalat and Potts representation as infinite sequences of convergent intervals • Basic limitation: equality and ordering of reals is not decidable. • eg, cannot decide whether x>0 in general • most algorithms assume a decidable equality test Robert Harper - PSciCo Project

  27. Real Numbers • CG algorithms assume that geometric objects are “discrete”. • eg, line-side and in-circle tests • Computationally, this is nonsense! • cannot decide in general if x lies on L or within C • Edalat suggests to treat geometric objects themselves as sequences of intervals. • successive refinement to interior and exterior • But we must re-work CG in this framework. Robert Harper - PSciCo Project

  28. Representing Surfaces • Triangulated surfaces arise frequently. • Finite element method for solving boundary-value problems. • Computation of convex hull of a set of points. • Terrain modeling from altitude data. • Putting clothes on the Toy Story characters. Robert Harper - PSciCo Project

  29. Representing Surfaces • A triangulated surface is a simplicial complex. • An n-complex is a set of n-simplexes that “fit together” well. • Intersection of two n-simplexes is an (n-1)-simplex. • A 1-simplex is a point, a 2-simplex a line, a 3-simplex a triangle, etc. • Vertices are affinely independent when embedded in n-space. Robert Harper - PSciCo Project

  30. Representing Surfaces signature SIMPLEX = sig structure Vertex : VERTEXtype simplexval compare : simplex * simplex -> orderval dim : simplex -> intval vertices : simplex -> Vertex.vertex seqval simplex : Vertex.vertex seq -> simplexval down : simplex -> Vertex.vertex * simplexval join : Vertex.vertex * simplex -> simplexval faces : simplex -> simplex seqval flip : simplex -> simplex end Robert Harper - PSciCo Project

  31. Representing Surfaces signature VERTEX = sig type vertexval compare : vertex * vertex -> ordertype statetype pointval new : state * point -> state * vertexval loc : vertex -> point end Robert Harper - PSciCo Project

  32. Representing Surfaces signature SIMPCOMP = sig structure Simplex : SIMPLEXtype ‘a complexval dim : intval empty : ‘a complexval isempty : ‘a complex -> boolval simplices : ‘a complex -> int -> Simplex.simplex seqval data : ‘a complex -> Simplex.simplex -> ‘a option... end Robert Harper - PSciCo Project

  33. Representing Surfaces signature SIMPCOMP = sig ... val grep : ‘a complex -> int * Simplex.simplex -> Simplex.simplex seq val add : ‘a complex -> Simplex.simplex * ‘a -> ‘a complex ... end Robert Harper - PSciCo Project

  34. Representing Surfaces • Signature of complexes is dimension-independent. • Allows dimension-independent code to manipulate surfaces. • Implementations are dimension-dependent. • Different dimensions have different types. Robert Harper - PSciCo Project

  35. Implementing Surfaces • Usual implementation is imperative. • Doubly-linked edge list. • Constant-time traversal operations. • Ephemeral, not easily parallelizable. • Our implementation is functional. • Simplex is a sequence of vertices. • Logarithmic-time traversal of complex. • Persistent, easily parallelizable. Robert Harper - PSciCo Project

  36. Cost of Persistence? • Lower bound on 3D hull: (n lg n). • Matching upper bounds are known. • eg, Quickhull algorithm. • But they rely on constant-time surface operations! • eg, find all three neighbors of a triangle • Can we match this in the persistent case? Robert Harper - PSciCo Project

  37. Cost of Persistence? • Naively, we’d expect O(n lg2 n) to pay for persistence. • Surface operations are O(lg n) instead of O(1). • Burch and Miller discovered an optimal randomized algorithm for the persistent case! • Touches surface O(n) times, at a cost of O(n lg n). • O(n lg n) floating point operations. • Plus it is persistent and parallelizable! Robert Harper - PSciCo Project

  38. Delaunay Triangulation • Generate a triangulation of a region with boundary (in 2-space). • Must include the boundary segments. • Configuration of triangles must be “good” in several senses (we’ll discuss one). • A triangulation is a 2-complex. • Use simplicial complex package to represent it. Robert Harper - PSciCo Project

  39. Delaunay Triangulation Robert Harper - PSciCo Project

  40. Delaunay Triangulation • Goal: avoid “skinny” triangles. • Lead to ill-conditioned numeric problems. • Method: “improve” a triangulation by adding new vertexes. • Turn a “skinny” triangle into two “fat” ones. • Where to add vertices? • Circumcenter of triangle “usually” works. • Other choices are sometimes forced. Robert Harper - PSciCo Project

  41. skinny triangle new edges remove base new vertex Delaunay Triangulation Robert Harper - PSciCo Project

  42. Rupert’s Algorithm • Adding a vertex may violate the boundary! • New vertex may introduce a new “skinny” triangle somewhere else in the region. • If the new triangle is on the boundary, further splitting may remove a boundary segment! Robert Harper - PSciCo Project

  43. cannot be removed new skinny triangle Rupert’s Algorithm Robert Harper - PSciCo Project

  44. put new vertex here instead Rupert’s Algorithm Robert Harper - PSciCo Project

  45. Rupert’s Algorithm • Typical solution: search whenever a vertex is about to be added. • Lots of ad hoc code for a rare case. • Introduces overhead at each step. • A simpler solution: back out when a boundary violation occurs. • Can be arbitrarily far in the future. Robert Harper - PSciCo Project

  46. Rupert’s Algorithm in ML • Use dynamic exceptions and persistent surface representation. • Associate an exception with each “new” vertex. • If the boundary is violated, raise the exception of some “new” vertex, passing a better location for it. • Handler restarts triangulation from there. Robert Harper - PSciCo Project

  47. Rupert’s Algorithm in ML • For the 2D case this is nice. • simplifies coding • easier to explain • For the 3D case this is a big win. • conditions much harder to state and enforce • persistence helps enormously • first implementation of 3D Ruppert nearly complete Robert Harper - PSciCo Project

  48. N-Body Problem • Problem: given n bodies, compute the potentials between them. • Gravitational potential between stars. • Electrical potential between charged bodies. • Naive solution: O(n2). • For each body, sum potential due to all other bodies in the system Robert Harper - PSciCo Project

  49. N-Body Problem • Approximate potentials by clustering distant bodies. • eg, think of Andromeda as one body relative to the Sun • Several choices: • How to form clusters? • How far is “distant”? • How to approximate potentials? Robert Harper - PSciCo Project

  50. Clustering • Build a tree by recursively dividing space into regions. • Each node is a cluster. • Each leaf is a body. • Two division methods: • Equal-sized regions of space (quaternary). • Split bounding box on longest dimension (binary). Robert Harper - PSciCo Project

More Related