610 likes | 626 Views
The PSciCo Project: P arallel Sci entific Co mputing in ML. Robert Harper Carnegie Mellon University August, 2000. Acknowledgements. This talk represents collaboration with Guy Blelloch (CS) , Gary Miller (CS) , and Noel Walkington (Math) .
E N D
The PSciCo Project: Parallel Scientific Computing in ML Robert Harper Carnegie Mellon University August, 2000
Acknowledgements • This talk represents collaboration with Guy Blelloch (CS), Gary Miller (CS), and Noel Walkington (Math). • Much of the work is done by our students: Umut Acar, Hal Burch, Franklin Chen, Perry Cheng, David Garmire, Aleks Nanevski, Leaf Petersen, Seth Porter, and Chris Stone. Robert Harper - PSciCo Project
Project Goal Use high-level languages for scientific computing to achieve ... • Smooth integration of production and prototyping. • Richer data structures and more complex algorithms. • Free exchange of re-usable components. • Expressiveness with acceptable efficiency. Robert Harper - PSciCo Project
Project Thesis We may achieve these goals by exploiting • Functional programming. • Persistent data structures. • Higher-order functions, especially staging (currying) and parameterization. • Implicit coarse-grained parallelism. • Parallel operations on aggregates. • Modular programming. • Interfaces, parameterization, hierarchy. Robert Harper - PSciCo Project
How We Can Win • Use methods that are “unthinkable” in current languages. • persistence, parallelism, higher-order functions, modularity • Change the terms of the debate. • Efficiency matters, but it’s not everything. • Expressiveness, concision, robustness, flexibility matter too. Robert Harper - PSciCo Project
How We Can Win • Eliminate the distinction between • Prototyping languages, for experimenting with new ideas. • Encourage “quick and dirty” hacks. • Poor software engineering properties. • Production languages, for building systems “for real”. • Sacrifice ease of use and maintainability for the sake of efficiency. Robert Harper - PSciCo Project
The State of the Art • Prototyping languages • eg, Mathematica, MatLab • Convenient, especially interactively • Support numeric and symbolic computation • High-level, language-oriented • Packages / toolkits for various domains • Easy to build an application Robert Harper - PSciCo Project
The State of the Art • Production languages • eg, Fortran, C, C++ • Batch-oriented, not interactive • Low-level, machine-oriented • Poor support for symbolic computation • Large collection of libraries, packages • More effort to build a system Robert Harper - PSciCo Project
Starting Point: ML + NESL • ML • Excellent support for complex data structures and symbolic computing • Excellent support for modularity • Poor support for numeric computation • Poor support for parallelism Robert Harper - PSciCo Project
Starting Point: ML + NESL • NESL • First-order functional language. • Excellent support for implicit parallelism. • “scan vector” model of data parallelism • No support for complex data stuctures or symbolic computing. • No support for modularity. Robert Harper - PSciCo Project
Current Work • Shared libraries. • Adaptive precision arithmetic. • General purpose geometry and physics. • Triangulated surfaces (simplicial complexes). • Applications. • Convex hull. • Delaunay triangulation. • n-Body problem. • Terrain modelling. • Finite element method. Robert Harper - PSciCo Project
Current Work • Language technology. • TILT compiler for Standard ML. • typed intermediate languages for efficient data representation • Parallel run-time system with space- and time-efficient thread scheduling. • exploited by data parallel sequence primitives. • Parallel, concurrent, real-time copying GC. • Multiple mutators and collectors on an SMP Robert Harper - PSciCo Project
Shared Libraries • Geometry primitives • Points, vectors, figures. • Line-side test, in-circle test. • Arithmetic. • adaptive precision floating point • interval arithmetic • Simplicial complexes. • Persistent representation of surfaces. Robert Harper - PSciCo Project
Representing Geometry • Single dimension-independent interface: signature GEOMETRY = ... • Several dimension-dependent implementations: structure Geo2D :> GEOMETRY = ... structure Geo3D :> GEOMETRY = ... • Different types for different dimensions! • avoids run-time checking for conformance Robert Harper - PSciCo Project
Representing Geometry signature GEOMETRY = sigstructure Vec : VECstructure Point : POINTstructure HalfSpace : HALF_SPACEstructure Box : BOXsharing Point.Vec = Vecand HalfSpace.Point = Point end Robert Harper - PSciCo Project
Representing Geometry signature VEC = sigstructure Scalar : NUMBERtype tval + : t * t -> tval scale : t * Scalar.t -> tval dot : t * t -> Scalar.tval norm : t -> Scalar.t end Robert Harper - PSciCo Project
Representing Geometry signature POINT = sigstructure Vec : VECstructure Number : NUMBERtype tval origin : tval translate : t * Vec.t -> tval - : t * t -> Vec.t end Robert Harper - PSciCo Project
Representing Geometry signature HALF_SPACE = sigexception Degeneratestructure Pt : POINTstructure Vec : VECtype tval from_pts : Pt.t Seq.t -> tval side_test : t -> Pt.t -> Side.t (* 3-valued *) end Robert Harper - PSciCo Project
An Object-Oriented Approach • CGAL uses OOP techniques. • No common interface for any dimension. • One implementation for all dimensions. Point translate (Point p, Vector v) • Requires run-time conformance checks. • Uses templates for numeric precision (float, double, etc) Robert Harper - PSciCo Project
ML vs OO • In ML conformance is ensured by sharing specifications. • Vectors and points must agree on common types. • Run-time checks are avoided. • ML supports “coherent integration” of abstractions better than do C++ or Java. • Geometry library is a good example of this. Robert Harper - PSciCo Project
Arithmetic • Key questions in geometric algorithms: • Side test: is a point above or below a given line or plane? • Circle test: is a point within a circle or sphere? • These functions convert a floating point number to a boolean. • Check sign of a determinant. Robert Harper - PSciCo Project
Arithmetic • It is essential that these questions be answered consistently and accurately. • Otherwise higher-level results are nonsense (eg, “convex hull” not convex). • It is essential that these be implemented efficiently. • Lie at the heart of just about any algorithm in CG. Robert Harper - PSciCo Project
Arithmetic • Adaptive precision via error analysis (Priest, Shewchuk). • Calculate using single precision. • Calculate “fuzz factor” by forward error analysis. • If reliable, answer, otherwise re-do in double precision, then arbitrary precision. • Key weakness: must derive and implement the error calculation; not a general arithmetic package. Robert Harper - PSciCo Project
Arithmetic • “Lazy” interval arithmetic. • Use a general interval arithmetic package for all arithmetic. • IA determines reliability of result (eg, for positivity test, interval cannot span origin). • Carry a symbolic representation of the “history” of the computation of the point. • Re-do using full precision if unreliable. Robert Harper - PSciCo Project
Arithmetic • Intervals work well for polynomials. • Enough to do CG tests, assuming “virgin data”. • Polynomials are not enough, in general! • eg, projections onto a sphere require square roots • How much can we feasibly support? • algebraic numbers? • transcendentals? Robert Harper - PSciCo Project
Real Numbers • Why not represent all (computable) real numbers? • eg, Edalat and Potts representation as infinite sequences of convergent intervals • Basic limitation: equality and ordering of reals is not decidable. • eg, cannot decide whether x>0 in general • most algorithms assume a decidable equality test Robert Harper - PSciCo Project
Real Numbers • CG algorithms assume that geometric objects are “discrete”. • eg, line-side and in-circle tests • Computationally, this is nonsense! • cannot decide in general if x lies on L or within C • Edalat suggests to treat geometric objects themselves as sequences of intervals. • successive refinement to interior and exterior • But we must re-work CG in this framework. Robert Harper - PSciCo Project
Representing Surfaces • Triangulated surfaces arise frequently. • Finite element method for solving boundary-value problems. • Computation of convex hull of a set of points. • Terrain modeling from altitude data. • Putting clothes on the Toy Story characters. Robert Harper - PSciCo Project
Representing Surfaces • A triangulated surface is a simplicial complex. • An n-complex is a set of n-simplexes that “fit together” well. • Intersection of two n-simplexes is an (n-1)-simplex. • A 1-simplex is a point, a 2-simplex a line, a 3-simplex a triangle, etc. • Vertices are affinely independent when embedded in n-space. Robert Harper - PSciCo Project
Representing Surfaces signature SIMPLEX = sig structure Vertex : VERTEXtype simplexval compare : simplex * simplex -> orderval dim : simplex -> intval vertices : simplex -> Vertex.vertex seqval simplex : Vertex.vertex seq -> simplexval down : simplex -> Vertex.vertex * simplexval join : Vertex.vertex * simplex -> simplexval faces : simplex -> simplex seqval flip : simplex -> simplex end Robert Harper - PSciCo Project
Representing Surfaces signature VERTEX = sig type vertexval compare : vertex * vertex -> ordertype statetype pointval new : state * point -> state * vertexval loc : vertex -> point end Robert Harper - PSciCo Project
Representing Surfaces signature SIMPCOMP = sig structure Simplex : SIMPLEXtype ‘a complexval dim : intval empty : ‘a complexval isempty : ‘a complex -> boolval simplices : ‘a complex -> int -> Simplex.simplex seqval data : ‘a complex -> Simplex.simplex -> ‘a option... end Robert Harper - PSciCo Project
Representing Surfaces signature SIMPCOMP = sig ... val grep : ‘a complex -> int * Simplex.simplex -> Simplex.simplex seq val add : ‘a complex -> Simplex.simplex * ‘a -> ‘a complex ... end Robert Harper - PSciCo Project
Representing Surfaces • Signature of complexes is dimension-independent. • Allows dimension-independent code to manipulate surfaces. • Implementations are dimension-dependent. • Different dimensions have different types. Robert Harper - PSciCo Project
Implementing Surfaces • Usual implementation is imperative. • Doubly-linked edge list. • Constant-time traversal operations. • Ephemeral, not easily parallelizable. • Our implementation is functional. • Simplex is a sequence of vertices. • Logarithmic-time traversal of complex. • Persistent, easily parallelizable. Robert Harper - PSciCo Project
Cost of Persistence? • Lower bound on 3D hull: (n lg n). • Matching upper bounds are known. • eg, Quickhull algorithm. • But they rely on constant-time surface operations! • eg, find all three neighbors of a triangle • Can we match this in the persistent case? Robert Harper - PSciCo Project
Cost of Persistence? • Naively, we’d expect O(n lg2 n) to pay for persistence. • Surface operations are O(lg n) instead of O(1). • Burch and Miller discovered an optimal randomized algorithm for the persistent case! • Touches surface O(n) times, at a cost of O(n lg n). • O(n lg n) floating point operations. • Plus it is persistent and parallelizable! Robert Harper - PSciCo Project
Delaunay Triangulation • Generate a triangulation of a region with boundary (in 2-space). • Must include the boundary segments. • Configuration of triangles must be “good” in several senses (we’ll discuss one). • A triangulation is a 2-complex. • Use simplicial complex package to represent it. Robert Harper - PSciCo Project
Delaunay Triangulation Robert Harper - PSciCo Project
Delaunay Triangulation • Goal: avoid “skinny” triangles. • Lead to ill-conditioned numeric problems. • Method: “improve” a triangulation by adding new vertexes. • Turn a “skinny” triangle into two “fat” ones. • Where to add vertices? • Circumcenter of triangle “usually” works. • Other choices are sometimes forced. Robert Harper - PSciCo Project
skinny triangle new edges remove base new vertex Delaunay Triangulation Robert Harper - PSciCo Project
Rupert’s Algorithm • Adding a vertex may violate the boundary! • New vertex may introduce a new “skinny” triangle somewhere else in the region. • If the new triangle is on the boundary, further splitting may remove a boundary segment! Robert Harper - PSciCo Project
cannot be removed new skinny triangle Rupert’s Algorithm Robert Harper - PSciCo Project
put new vertex here instead Rupert’s Algorithm Robert Harper - PSciCo Project
Rupert’s Algorithm • Typical solution: search whenever a vertex is about to be added. • Lots of ad hoc code for a rare case. • Introduces overhead at each step. • A simpler solution: back out when a boundary violation occurs. • Can be arbitrarily far in the future. Robert Harper - PSciCo Project
Rupert’s Algorithm in ML • Use dynamic exceptions and persistent surface representation. • Associate an exception with each “new” vertex. • If the boundary is violated, raise the exception of some “new” vertex, passing a better location for it. • Handler restarts triangulation from there. Robert Harper - PSciCo Project
Rupert’s Algorithm in ML • For the 2D case this is nice. • simplifies coding • easier to explain • For the 3D case this is a big win. • conditions much harder to state and enforce • persistence helps enormously • first implementation of 3D Ruppert nearly complete Robert Harper - PSciCo Project
N-Body Problem • Problem: given n bodies, compute the potentials between them. • Gravitational potential between stars. • Electrical potential between charged bodies. • Naive solution: O(n2). • For each body, sum potential due to all other bodies in the system Robert Harper - PSciCo Project
N-Body Problem • Approximate potentials by clustering distant bodies. • eg, think of Andromeda as one body relative to the Sun • Several choices: • How to form clusters? • How far is “distant”? • How to approximate potentials? Robert Harper - PSciCo Project
Clustering • Build a tree by recursively dividing space into regions. • Each node is a cluster. • Each leaf is a body. • Two division methods: • Equal-sized regions of space (quaternary). • Split bounding box on longest dimension (binary). Robert Harper - PSciCo Project