140 likes | 275 Views
X10: Performance and Productivity at Scale. 李强. X10 Project status. X10 is developed by the IBM PERCS project as part of the DARPA program on High Productivity Computing Systems (HPCS) Target markets: Scientific computing, business analytics X10 2.4.2 release in Feb 2014 Java-backend
E N D
X10 Project status • X10 is developed by the IBM PERCS project as part of the DARPA program on High Productivity Computing Systems (HPCS) • Target markets: Scientific computing, business analytics • X10 2.4.2 release in Feb 2014 • Java-backend • C++-backend
X10 overview • X10 is an instance of the Asynchronous PGAS model in the Java family • Threads can be dynamically created under programmer control (as opposed to SPMD execution of MPI, UPC, FORTRAN) • n distinct threads, p distinct memories (n <> p) • APGAS model, the Asynchronous, Partitioned Global Address Space model.
X10 Hello world! class Generic datatype method • 1 // file HelloWorld.x10 • 2 public class HelloWorld { • 3 public static def main(args: Array[String](1)):Void { • 4 x10.io.Console.OUT.println("Hello, World"); • 5 } • 6 } package
X10 Basics • X10 is an object-oriented language based on Java • Base data types • Non-numeric: Boolean, Byte, Char and String • Fixed point: Short, Int and Long • Floating point: Float, Double and Complex • Top level containers: classes and interfaces, grouped into packages • Objects are instantiated from classes
Difference with java • var n:Long = 0; • public static val solution = [1,2,3,4,5] ; • val N:long; • Fields may be mutable (var) or immutable (val). • The type of a mutable field must always be specified. • A mutable field may or may not be initialized. • The type of an immutable field may be omitted if the field declaration specifies an initializer. • Function as argument, x10 supports both OO and functional programming.
X10 Parallelism • Parallelism = Activities + Places • Basic parallel constructs (async, at, finish, atomic) • Activities • All X10 programs begin with a single activity executing main in place 0 • Places hold activities and objects
public class TutAtomic2 { constint a = new boxedInt(100); constint b = new boxedInt(100); public static atomic void incr_a() { a.val++ ; b.val-- ; } public static atomic void decr_a() { a.val-- ; b.val++ ; } public static void main(String args[]) { int sum; finish { async for (inti=1 ; i<=10 ; i++ ) incr_a(); for (inti=1 ; i<=10 ; i++ ) decr_a(); } atomic sum = a.val + b.val; System.out.println("a+b = " + sum); } // main() } // TutAtomic2
Limitations of using a Single Place • Largest deployment granularity for a single place is a single SMP • Smallest granularity can be a single CPU or even a single hardware thread • Single SMP is inadequate for solving problems with large memory and compute requirements • X10 solution: incorporate multiple places as a core foundation of the X10 programming model Enable deployment on large-scale clustered machines, with integrated support for intra-place parallelism
Scalable X10: using multiple places • class HelloWholeWorld { • public static def main(args:Array[String](1)):void { • for (var i:Int=0; i<Place.MAX_PLACES; i++) { • valiVal = i; • async at (Place.places(iVal)) { • Console.OUT.println("Hello World from place "+here.id); • } • } • } • } Hello World from place 0 Hello World from place 2 Hello World from place 3 Hello World from place 1
at – place shift • Shift current activity to a place to evaluate an expression, then return • Copy necessary values from calling place to callee place, discard when done