430 likes | 441 Views
Learn about X10, a new language based on Java, developed by IBM for non-uniform computing clusters (NUCCs). X10 introduces the concept of Partitioned Global Address Space (PGAS) to address scalability issues in parallel languages.
E N D
X10: IBM’s bid into parallel languages Paul B Kohler Kevin S Grimaldi University of Massachusetts Amherst
introduction • A new language based of Java • IBM’s entry to the DARPA’s PERCS project (Productive Easy-to-use Reliable Computer Systems) • Built for NUCCs(Non-Uniform Computing Clusters) where different memory locations incur different cost.
intro continued • Will eventually be combined with new tools for Eclipse • Goals • Safe • Analyzable • Scalable • Flexible
PGAS • Past attempts at parallel languages have used the illusion of a single shared memory • This does not represent the situation in NUCC. • Problems occur when we try divide memory among processors. • X10 uses PGAS to reveal the non-uniformity and make the language scalable.
PGAS(co nt) • PGAS=Partitioned Global Address Space • Memory partitioned into places. Data is associated with a place and can only be read/changed locally. • Provided in X10 through the abstractions of places and activities.
Places • Contain a collection of resident mutable data objects and associated activities • Places represent locality boundaries • Very efficient access to resident data • Set of places remains fixed at runtime • Places are virtual • Mapped to physical processors by runtime • Runtime may transparently migrate places
Using Places • Accessible via place.places • First activity runs at place.FIRST_PLACE • Iterate over places with next() and prev() • here represents current place
Activities • Similar to java threads. • Activities are associated with a place. • Activities never migrate places. • Activities may only read/modify mutable data that is local to its place. • However immutable data (i.e.final or value) maybe accessed by any activity.
Activities (cont) • Activities are GALS(Globally Asynchronous Locally Synchronous) • Local data accesses are synchronized • Global data accesses are not by default. Synchronization can be explicitly forced.
Activities:Syntax • It is very simple to spawn new activities: async(place)statement • This runs the specified statement at the specified place. • Example: • final int result; async(here.next()){result=a+b} This would add two numbers at the adjacent place and store the result(since result is final it can be accessed by other places)
Type System • X10 is strongly typed • Unified type system • Everything is an object; no primitive types • Library supplies boolean, byte, short, char, int, long, float, double, complex, String classes • Borrows Java’s single inheritance combined with interfaces
Reference vs Value Types • Two types of objects • Value types are immutable and can be freely copied • Reference types can contain mutable fields but cannot be migrated • Value classes are declared value keyword instead of class • Value classes can still contain fields that are of reference types • Allows them to refer to mutable data • Copying ‘bottoms out’ on reference fields
Type System (cont) • Objects are either scalar or aggregate • Each of value and reference types can be either scalar or aggregate • Types consist of two parts • Data type – The set of values it can take • Place type – The place at which it resides • No generics (yet)
Variables • Variables must be initialized (can never be observed without a value) • final variables cannot be changed after initialization • Declared by using the final keyword and/or using a variable name that starts with a capital letter
Nullable Types • Designers view ability to hold null value as orthogonal to value vs reference type • Either reference or value types can be preceded by nullable • Adds a null value to the type • Multiple nullables are collapsed (i.e. nullable nullable T = nullable T) • Can cast between T and nullable T • (nullable T) v always succeeds • (T) null throws an exception if T is not nullable
Rooted exceptions • What should happen when a thread/activity terminates abnormally? • In java it’s unclear since the spawning thread may have already terminated. • X10 uses a rooted exception model. All uncaught exceptions get passed to the calling activity. • A new blocking command finish s is introduced. This command waits for all activities in s to terminate before proceeding.
Exceptions (cont) • Finish allows exceptions to travel back towards the root activity and possibly be caught and handled along the way. • Example: try{ finish async(here.next()){ throw new Exception(); } } catch(Exception e){ }
Arrays • X10 features an array sub-language similar to ZPL. • Arrays have: • Regions • Distributions • Arrays are operated on by: • for • foreach • ateach • And more!
Even more arrays • Arrays may be value(immutable) or reference(mutable) • Keyword unsafe allows arrays that will play nice with java code. • Arrays can run code as an initialization step.
Arrays:Regions • Regions:As in ZPL a region is a set of indexed data points. • Regions and distributions are first class constructs. • Regions can be specified like this: • [0:128,0:256] creates a region 128x256
Regions(cont.) • Regions can be modified by operation such as union(||), intersection(&&) and set difference(-). • Predefined regions types can be constructed using factories. region R2 = region.factory.upperTriangular(25) • In the future users may be able to define there own regions.
Arrays:Distributions • Every array has a distribution. • A distribution is mapping of array elements to places. • Distributions are over a particular region. • Arrays are typed by their distribution.
Distributions cont. • Currently must use pre-defined distributions(unique,block,cyclic…etc.) • Have set operations like regions. • Can be used as functions so for a point p and distribution d: d[p]=place which point p maps to(i.e. where the p’th element “lives”).
Subarrays • Use various boolean operations on distributions to create subdistributions • To get the portion of a block distribution that is located here: block([1:100]) && [1:100]->here • a | D1 is the portion of array a corresponding to the subdistribution D1
Array construction • Here is an example of array initialization: float [.] data= new[factory.cyclic([0:200,50:250])] (point [i, j]){return i+j};
Array construction • Here is an example of array initialization: float [.] data= new[factory.cyclic([0:200,50:250])] (point [i, j]){return i+j}; • This specifies a 200x200 region
Array construction • Here is an example of array initialization: float [.] data= new[factory.cyclic([0:200,50:250])] (point [i, j]){return i+j}; • This specifies a 200x200 region. • This specifies a cyclic distribution over the region.
Array construction • Here is an example of array initialization: float [.] data= new[factory.cyclic([0:200,50:250])] (point [i, j]){return i+j}; • This specifies a 200x200 region. • This specifies a cyclic distribution over the region. • This code initialize each element to the some of its i,j coordinates
Array iteration • Once you have an array what can you do with it? • Array iterators: for, foreach, ateach • for: Sequentially iterates over a supplied region. At each point it binds the point to a variable and executes the accompanying statement. • foreach: As with for but operations are done in parallel. That is it spawns a new activity for each point. • ateach: takes a distribution instead of a region. Performs operations in parallel at the place specified by the distribution.
Iteration example • Example: for(point p : A){ A[p]=A[p]*A[p] }
More array ops • lift: Takes a binary function and two arrays of the same distribution. Produces a new array formed by a pointwise application of the function to the two arrays. • reduce: As in MPI applies a binary function to every element to produce a single value. • scan: Creates a new array where the i’th element is the result of reduction on the first i elements.
Atomic Blocks • X10 allows you to define atomic blocks • The contents of a block is guaranteed to execute as a single atomic event. This is only in regards to other activities in the same place. • While this is guaranteed to be atomic the details are implementation specific. • Syntax: atomic S
Conditional Atmc Blck • Also provides: when(Cond) S • This blocks until cond is true and then executes S atomically. • This allows the creation of a number of synchronization mechanisms. • Dangerous! If cond is never true or if there is a cycle deadlock occurs.
Future and Force • As discussed before futures allow the asynchronous computation of a value that may be used in the future. • Futures return a object of type Future<T> • Force is a blocking call that waits for a particular future to be finished
Futures(cont.) • Can only access final variables. This prevents side effects. • Syntax: future(p)e • Example: Future <float> blah = future(here.next){sqrt(a^2+b^2)};
Clocks • Act as barriers • Much more flexible • Guarantee no deadlock • Dynamically associated with different sets of activities
Clock Semantics • Activities register with zero or more clocks • Can register/unregister at any time • Clocks are always in some phase • Do not advance until allcurrently registered activities quiesce • Activities quiesce with next operation • Indicates they are ready for all their clocks to advance • Suspends until all clocks have advanced • This makes deadlock impossible
Status • IBM has supposedly built a single VM reference implementation • Language still under heavy revision • GPL’ed X10-XTC compiler available • Doesn’t conform to current language spec • Uses what will possibly be version 0.5 • Speculatively contains support for operator overloading and generics • Currently very poor performance
conclusion • So is X10 the answer to all our parallel programming woes?
conclusion • So is X10 the answer to all our parallel programming woes? • In my opinion probably not.
conclusion • So is X10 the answer to all our parallel programming woes? • In my opinion probably not. • Parallelism still very explicit. Still opportunities for deadlock, race conditions etc.
conclusion • So is X10 the answer to all our parallel programming woes? • In my opinion probably not. • Parallelism still very explicit. Still opportunities for deadlock, race conditions etc. • Takes a “…and the kitchen sink” approach which makes learning the syntax a chore.
conclusion • So is X10 the answer to all our parallel programming woes? • In my opinion probably not. • Parallelism still very explicit. Still opportunities for deadlock, race conditions etc. • Takes a “…and the kitchen sink” approach which makes learning the syntax a chore. • It’s not FORTRAN. Will people bother to use it?