180 likes | 187 Views
This paper presents a concurrent programming model, Atomic-Delayed Execution, for incomplete graph-based computations. It addresses challenges in big data and graph analytics, cyber-security, large network systems, and social networks. The model allows for efficient computations with massive concurrency and little data locality, frequent synchronization, and dynamic and imbalanced workloads. The paper includes examples and experiments to demonstrate the effectiveness of the model.
E N D
Atomic-Delayed Execution: A Concurrent Programming Model for Incomplete Graph-based Computations Pedro C. Diniz Information Sciences Institute Viterbi School of Engineering
Motivation • Big-Data and Graph Analytics • Cyber-Security • Large Network Systems • Social Networks • Combination of the above • Challenges • Ton of bytes (not ton of flops) • Massive Concurrency but Little data locality • Low Computation to Communication ratio • Frequent Synchronization • Work tends to be Dynamic and Imbalanced • Data may even become unavailable • Programming for this Application Domain is Non-Trivial
Example: Minimum Distance to Root Node • Simple Pointer-based Acyclic Graph Computation • Compute for each node the Minimal Distance to a “root” Node • Store Value of Distance in Node • Save Selected Nodes in Set 0 0 0 2 2 1 1 1 1 4 4 2 3 3 2 5 5 2
Example: Minimum Distance to Root Node • Because the Graph is Potentially Very Big • Cannot Do It Sequentially • Limited in Time • Need to Tolerate “incorrect” Answers • Exploit Concurrency • Atomic Updates to Distance in Node • Skip if Value is Already Lower than Argument 0 0 2 1 1 1 4 2 3 2 5 2
Example: Concurrent Traversal • Create a Thread at Each Invocation • Visit Nodes and Check Distance against Argument • Update Distance Atomically and Proceed 0 0 1 2 1 2 2 4 3 3 5
Example: Concurrent Traversal • Create a Thread at Each Invocation • Visit Nodes and Check Distance against Argument • Update Distance Atomically and Proceed 0 0 1 1 2 1 2 2 4 3 3 2 5
Example: Concurrent Traversal • Create a Thread at Each Invocation • Visit Nodes and Check Distance against Argument • Update Distance Atomically and Proceed 0 0 1 1 2 1 2 2 4 3 3 2 5 • Yes, we may do more work than sequential
Example: Code void node::traversal(intval) @ { time(T) } { atomic{ if(depth > val){ depth = val; } } par{ if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); }exception{ error.memory: { continue; } timer.expired: { return; } } } class node {intdepth; node *left, *right; };
Example: Code void node::traversal(intval) @ { time(T) } { atomic{ if(depth > val){ depth = val; } else { return; } } par{ if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); }exception{ error.memory: { continue; } timer.expired: { return; } } } class node {intdepth; node *left, *right; };
Example: Delayed Execution exception { timer.expired : { delayed @ time (T) { par{ if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); } } } • When Time Expires: • Return Control • Continue for another Time Quantum • Separate Thread • Updates Objects Atomically
Concepts: Objects, Concurrency and Atomic • Objects and Methods • Data Encapsulation • Separability (key): • Decouple Updates to Object from Concurrent Invocations • Uses only symbolically constant object data and arguments • Atomicity: • Avoids Race but not indeterminism • Facilitates Reasoning • In Principle could have Many Atomic Sections • Concurrency
Experiments: Concurrency Environment • Using pthreads • Master threads and N Workers • Work stealing at a work-pool • Exception flag is checked when attempting to steal work • Objects in C share a Pool of Mutex Locks • Some possible false contention • Timed and Delayed Execution • Sharing two global Timers (for simplicity)
Experiments: Graph Computation • Search Image Feature in Graph • Nodes represent people and have 1 image • Edges represent associations • Collect from a given “root” node • Nodes at distance greater than 2 • Share the same features (computational intensive) • Graphs Synthetically-Generated with RMAT algorithm • Experiments: • Timed Executions • Faults in Node Edges
Summary • Object-based programming model with timed and delayed executions • Geared towards computations in very large data sets where the data cannot be traversed in useful time or is simply unavailable due to uncorrected memory errors. • Presented experimental results for a concurrent incomplete graph-based computation to deliver feasible results in strict time bounds and in the presence of memory errors. • Foresee the need to allow programmers to specify time limits for the computation so that systems can make progress with limited, and incomplete, data.
Acknowledgements Partial support for this work was provided by the US Army Research Office (Award W911NF-13-1-0219) Partial support for this work was provided by the US Department of Energy (DoE)Office of Science, Advanced Scientific Computing Research through the SciDAC-3 SUPER Research Institute (Contract Number DE-SC0006844) Acknowledgements
Pedro Diniz pedro@isi.edu