520 likes | 610 Views
A Framework for Reasoning About Inherent Parallelism in Modern Object-Oriented Languages. Presented by A. Craik ( 5 -Jan-12). Research supported by funding from Microsoft Research and the Queensland State Government. Introduction. Semantic Analysis. Dependency Analysis.
E N D
A Framework for Reasoning About Inherent Parallelism in Modern Object-Oriented Languages Presented by A. Craik (5-Jan-12) Research supported by funding from Microsoft Research and the Queensland State Government
Introduction Semantic Analysis Dependency Analysis Procedural Algorithm Parallel Algorithm Sequential Implementation Explicitly Parallel Implementation Procedural Algorithm Sequential Implementation w/ Injected Parallelism 2
Introduction • Inherent Parallelism: a = 1; b = 2; c = a + b; • Three steps for finding & exploiting: • Find the inherent parallelism in the program • Decide which inherent parallelism is worth exploiting • Choose an implementation technology to expose the selected parallelism for (inti=0;i<max;++i) a[i] = a[i] + 1;
Introduction • Dependencies impose ordering constraints • Sequential consistency required • Two forms • Control – which statements will run • Data – reads & writes of shared state • Control well studied and easier to handle inter-procedurally • Example, Java checked exceptions
Data Dependencies • Flow Dependence (Write-After-Read)int a = 1;int b = a + 1;a = 2; • Output Dependence (Write-After-Write)int a = 1;a = 4;a = 5; • Anti-Dependence (Read-After-Write)int a = 1;a = 2;int b = a + 1;
Traditional Approach for (inti=0; i < 3; ++i) { for (int j=0; j < i+1; ++j) { a[i,j] = b[i,j] + c[i,j]; b[i,j] = a[i,j+1]; } } • Pair-wise analysis of statements and expressions • Can a, b or c refer to the array?
Traditional Approach for (inti=0; i < 3; ++i) { for (int j=0; j < i+1; ++j) { a[i,j] = b[i,j] + c[i,j]; b[i,j] = a.readIandJInc(i,j); } } • What does a.readIandJInc(i,j) do? • Examine ALL possible implementations!
Side-Effects class Holder { public static int value; } class Array { public intreadsIandJInc(i,j) { return this[i,j+1]; } }
Side-Effects class Holder { public static int value; } class Array { public intreadsIandJInc(i,j) { this[0,0] = i + j; return this[i,j]; } }
Side-Effects class Holder { public static int value; } class Array { public intreadsIandJInc(i,j) { Holder.value++; return this[i,j]; } }
Limitations of Current Techniques • Traditional: • Focused on analyzing complex tight loops • Poor abstraction and composition • Too complex for programmers to use without tool support
The Idea • Goal: • Simplify inter-procedural dependency analysis • Idea: • Ensure safety • Make reasoning modular and composable
The Idea • Specify effects on method signature: public intgetReads() reads<> writes<> • What goes in the angle brackets? • Abstract effect description • Composable descriptions • Verifiable
Object-Orientation • Encapsulation representation hierarchy Person Company name dateOfBirth employer Date String
Safe Parallelism Block 1 { ... } reads <a,b> writes <c,d> Block 2 { ... } reads <w,x> writes <y,z> • Can 2 arbitrary pieces of code execute in parallel safely? • Type rules specify computation of effect sets • Look for overlaps in the read & write effect sets to find possible data deps.
Dependencies using Effect Sets • Dependency exists where two triangles of representation overlap • Triangles can only be nested: • Becomes a check for a parent-child relationship; disjointess no dep.
Types of Parallelism • Task Parallelism • Run 2+ separate ops. at same time • Loop Parallelism • Execute loop iterations in parallel • Pipeline Parallelism • Stage loop body execution so that iteration execution overlaps safely
Task Parallelism class Demo { void op1() reads<a,b> writes<c,d> {…} void op2() reads<w,x> writes<y,z> {…}} • Can we execute calls to op1 and op2 in parallel? • Determine the overlap in the effect sets; no overlap no data deps. • Realization using one-way calls or futures
Loop Parallelism Conditions • Data parallel loops major source of parallelism in imperative programs • Start with simple data parallel loop in the form of a foreach loop:foreach (T element in collection) element.operation();
Foreach Loop Conditions • Condition 1:Areas holding the representations of the objects returned by the enumerator are all disjoint from one another
Foreach Loop Conditions • Condition 2:The operation only mutates the representation of its “own” element and does not read the state owned by any of the other elements
Foreach Loop Conditions • Condition 3:There are no control dependencies which would prevent loop parallelization
Arbitrary Loop Bodies • So far we have looked at foreach(T element in collection)element.operation(); • Question: How do we generalize this to an arbitrary loop body? foreach(T element in collection) { //sequence of statements //including local vardefs//and a read of a context r }
Loop Body Rewriting • Loop becomes: foreach (T elem in collection) elem.loopBody(this); • Where loopBody is: class T { void loopBody(Foo me) { //same sequence of statements //replace all elem by this //and all this by me } }
Object-Orientation • Encapsulation representation hierarchy Person Company name dateOfBirth employer Date String
Ownership Types • Designed to enforce encapsulation • Adapted to validate encapsulation • Type parameters to capture memory referencing permissions class Person [o,c] {private String|this| Name;private Date|this| DateOfBirth;private Company|c| Employer;… }
Ownerships & Effects classCompany[o] {publicstring name;… } classPerson[o,c] {privateCompany|c| Employer; publicstringemployerName()reads<this,c> writes<> {return Employer.name;}… }
Contexts and Dependencies • Analyze & apply sufficient conditions • All pairs of context relations need to be known • Need some basis to believe the relationships between contexts to hold
Reasons for a Runtime System • Statically know some relationships • The owner of an object is a parent of the object’s this context • The world context is a parent of all contexts • Relationship may only be known dynamically • Optionally track at runtime to allow runtime conditions
Conditional Parallelism parallel for(T<c> e in collection){ e.operation(arguments); } disjoint(r,c) Always True if (disjoint(r,c)) { parallel version } else { sequential version } for(T<c> e in collection){ e.operation(arguments); } disjoint(r,c) unknown serial for(T<c> e in collection){ e.operation(arguments); } disjoint(r,c) Always False
Reasons for a Runtime System • We do not know the relationships between all contexts at compile time. • May vary from one object or method invocation to another • Reasons: • Separate Compilation • Dynamic Linking • Complex Data Flows
Reasons for a Runtime System • Type system provides support for specifying context relationships programmer asserts must be true void oper1[r]() reads<r,c…> writes<…> where r # c { …foreach(T|c| elem in collection){…} …}
Runtime System Implementation • Naïve implementation – each object keeps a pointer to its owner
Well Formed Heap Subject Reduction Progress Owner Invariance AFJO Soundness Effect Soundness Contexts form a Tree Cast Safety Effect Completeness Static Context Relations Context Parameters do not survive Disjointness Test Correct Context Disjointness Implies Effect Disjointness Disjoint effects imply no data dependencies Update Dependency Preservation Sufficient for Parallelization Sequential Consistency Task Parallelism Sufficient Conditions Data Parallelism Sufficient Conditions Pipeline Parallelism Sufficient Conditions
Implementation – Zal • Added my system to C# 3.5 • Extended GPC# compiler • Added infrastructure to support arbitrary type parameters • Implemented runtime ownership tracking system (~1,000 lines)
Implementation – Zal Zal Compiler Microsoft C# Compiler Zal source C# source CIL Program w/ Ownership Tracking Executing Program with Automatic Parallelization Runtime Ownership Libraries
Implementation – Zal Legend EffectComputation Parallelization OwnershipImplementation C# compilation step AST AST Parallelize() Checks sufficient conditions for parallelism and implements them computeEffects() LocalEffects() Computes heap & stack effects for AST nodes BuildOwnershipImplementation() Implements Zal features in C# by modifying AST Zal compilation step I/O AST AST Scannergenerated by GPLex Parsergenerated by Coco/R Type Checker CodeGeneration Tokens AST AST Scanner.scan() Reads a stream of characters and processes them into tokens Parser.parse() Converts stream of tokens into an Abstract Syntax Tree TypeCheck() Resolves all TypeRefs to TypeDefs & checks type correctness Output() Emit Generates C# or CIL implementation of AST
Validation • Have applied my system to a number of realistic applications • Overall annotation requires modification to 20% of the source • Ownership tracking overhead: • Execution time: 10% to 20% • Memory usage: 15% to 30% • Implementation not fully optimized
Related Work – Prog. Langs. • Focus on providing tools to express parallelism • No support for validating correctness of parallelization • Assumed programmer knowledge of parallel programming constructs • Examples: Fortress, Chapel, X10
Related Work – Ownership • Have proposed effect systems, but only suggested application to parallelism • Data race and dead lock detection for locking – very different reasoning • Deterministic Parallel Java (late 2009) • modified ownerships • Focused on kernels • Lost composition & abstraction to do so
Contributions • Abstract and composable system for reasoning about effects based on Ownership Types. • Effect and reasoning systems applied to a real language and real program examples • Real parallelism detected and exploited automatically
Contributions • Developed and proved sufficient conditions for a number of different forms of parallelism • Runtime system to support static reasoning.
Publications A. Craik and W. Kelly. Using Ownership to Reason About Inherent Parallelism in Imperative Object-Oriented Programs. International Conference on Compiler Construction. ed. R. Gupta, LNCS 6011, pp. 145-164, Springer-Verlag Berlin Hiedleberg, 2010. W. Reid, W. Kelly, and A. Craik. Reasoning about Parallelism in Modern Object-Oriented Languages. Australasian Computer Science Conference. 2008 +3 technical reports on various versions of the reasoning systemin e-prints
Conclusion • System for reasoning about data dependencies and parallelism • Abstract & composable • Usable by both programmers & automated tools • Question of when & how to exploit still open • Demonstration this automated reasoning is possible w/ prototype
Ownership & The Stack • Ownerships traditionally for encapsulation • Stack not considered by these works • Stack & stack referencing models vary from language to language • I consider a restricted stack model: • Stack and heap are disjoint • Stack locations can be differentiated by name