420 likes | 508 Views
Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond Michael Neary & Peter Cappello Computer Science, UCSB. Introduction Goals. Service parallel applications that are: Large : too big for a cluster Coarse-grain : to hide communication latency Simplicity of use
E N D
Java-Based Parallel Computing on the Internet: Javelin 2.0 & BeyondMichael Neary & Peter CappelloComputer Science, UCSB
IntroductionGoals • Service parallel applications that are: • Large: too big for a cluster • Coarse-grain: to hide communication latency • Simplicity of use • Design focus: decomposition [composition] of computation. • Scalable high performance • despite large communication latency • Fault-tolerance • 1000s of hosts, each dynamically [dis]associates.
IntroductionSome Applications • Search for extra-terrestrial life • Computer-generated animation • Computer modeling of drugs for: • Influenza • Cancer • Reducing chemotherapy’s side-effects • Financial modeling • Storing nuclear waste
Outline • Architecture • Model of Computation • API • Scalable Computation • Experimental Results • Conclusions & Future Work
Architecture Basic Components Clients Brokers Hosts
Architecture Broker Discovery B B B Broker Naming System B B B H B B B
Architecture Broker Discovery B B B Broker Naming System B B B H B B B
Architecture Broker Discovery B B B Broker Naming System B B B H B B B
Architecture Broker Discovery B B B Broker Naming System B B B H B B B PING (BID?)
Architecture Broker Discovery B B B Broker Naming System B B B H B B B
ArchitectureNetwork of Broker-Managed Host Trees • Each broker manages a tree of hosts
ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network
ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network • Client contacts broker
ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network • Client contacts broker • Client gets host trees
Scalable ComputationDeterministic Work-Stealing Scheduler addTask( task ) getTask( ) Task container stealTask( ) HOST
Scalable ComputationDeterministic Work-Stealing Scheduler Task getWork( ) { if ( my deque has a task ) return task; else if ( any child has a task ) return child’s task; else return parent.getWork( ); } CLIENT HOSTS
Models of Computation • Master-slave • AFAIK all proposed commercial applications • Branch-&-bound optimization • A generalization of master-slave.
0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = LOWER = 0 0
0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = LOWER = 2 0 2
0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = LOWER = 3 0 2 3
0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 4 LOWER = 4 0 2 3 4
0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 3 LOWER = 3 0 2 3 4 3
0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 3 LOWER = 6 0 2 3 6 4 3
0 0 7 2 7 2 3 8 6 10 3 6 4 3 8 7 12 10 9 10 4 3 Models of ComputationBranch & Bound UPPER = 3 LOWER = 7
0 7 2 3 6 4 3 Models of ComputationBranch & Bound • Tasks created dynamically • Upper bound is shared • To detect termination: scheduler detects tasks that have been: • Completed • Killed (“bounded”)
API public class Host implements Runnable { . . . public void run() { while ( (node = jDM.getWork()) != null ) { if ( isAtomic() ) compute(); // search space; return result else { child = node.branch(); // put children in child array for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) jDM.addWork( child[i] ); //else child is killed implicitly } } }
API private void compute() { . . . boolean newBest = false; while ( (node = stack.pop()) != null ) { if ( node.isComplete() ) if ( node.getCost() < UpperBound ) { newBest = true; UpperBound = node.getCost(); jDM.propagateValue( UpperBound ); best = Node( child[i] ); } else { child = node.branch(); for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) stack.push( child[i] ); //else child is killed implicitly } } if ( newBest ) jDM.returnResult( best ); } }
Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound
Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound
Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound
Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound
Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound
Scalable ComputationFault Tolerance via Eager Scheduling When: • All tasks have been assigned • Some results have not been reported • A host wants a new task Re-assign a task! • Eager scheduling tolerates faults & balances the load. • Computation completes, if at least 1 host communicates with client.
0 7 2 3 6 4 3 Scalable ComputationFault Tolerance via Eager Scheduling • Scheduler must know which: • Tasks have completed • Nodes have been killed • Performance balance • Centralized schedule info • Decentralized computation
0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Experimental Results Example of a “bad” graph
Conclusions • Javelin 2 relieves designer/programmer managing a set of [Inter-] networked processors that is: • Dynamic • Faulty • A wide set of applications is covered by: • Master-slave model • Branch & bound model • Weak shared memory performs well. • Use multicast (?) for: • Code distribution • Propagating values
Future Work • Improve support for long-lived computation: • Do not require that the client run continuously. • A dag model of computation • with limited weak shared memory.
Future WorkJini/JavaSpaces Technology “Continuously” disperse Tasks among brokers via a physics model H H H TaskManager aka Broker H H H H H
Future WorkJini/JavaSpaces Technology • TaskManager uses persistent JavaSpace • Host management: trivial • Eager scheduling: simple • No single point of failure • Fat tree topology
Future WorkAdvanced Issues • Privacy of data & algorithm • Algorithms • New computation-communication complexity model • N-body problem, … • Accounting: Associate specific work with specific host • Correctness • Compensation (how to quantify?) • Create open source organization • System infrastructure • Application codes