1 / 42

Introduction Goals

Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond Michael Neary & Peter Cappello Computer Science, UCSB. Introduction Goals. Service parallel applications that are: Large : too big for a cluster Coarse-grain : to hide communication latency Simplicity of use

toby
Download Presentation

Introduction Goals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Java-Based Parallel Computing on the Internet: Javelin 2.0 & BeyondMichael Neary & Peter CappelloComputer Science, UCSB

  2. IntroductionGoals • Service parallel applications that are: • Large: too big for a cluster • Coarse-grain: to hide communication latency • Simplicity of use • Design focus: decomposition [composition] of computation. • Scalable high performance • despite large communication latency • Fault-tolerance • 1000s of hosts, each dynamically [dis]associates.

  3. IntroductionSome Related Work

  4. IntroductionSome Applications • Search for extra-terrestrial life • Computer-generated animation • Computer modeling of drugs for: • Influenza • Cancer • Reducing chemotherapy’s side-effects • Financial modeling • Storing nuclear waste

  5. Outline • Architecture • Model of Computation • API • Scalable Computation • Experimental Results • Conclusions & Future Work

  6. Architecture Basic Components Clients Brokers Hosts

  7. Architecture Broker Discovery B B B Broker Naming System B B B H B B B

  8. Architecture Broker Discovery B B B Broker Naming System B B B H B B B

  9. Architecture Broker Discovery B B B Broker Naming System B B B H B B B

  10. Architecture Broker Discovery B B B Broker Naming System B B B H B B B PING (BID?)

  11. Architecture Broker Discovery B B B Broker Naming System B B B H B B B

  12. ArchitectureNetwork of Broker-Managed Host Trees • Each broker manages a tree of hosts

  13. ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network

  14. ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network • Client contacts broker

  15. ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network • Client contacts broker • Client gets host trees

  16. Scalable ComputationDeterministic Work-Stealing Scheduler addTask( task ) getTask( ) Task container stealTask( ) HOST

  17. Scalable ComputationDeterministic Work-Stealing Scheduler Task getWork( ) { if ( my deque has a task ) return task; else if ( any child has a task ) return child’s task; else return parent.getWork( ); } CLIENT HOSTS

  18. Models of Computation • Master-slave • AFAIK all proposed commercial applications • Branch-&-bound optimization • A generalization of master-slave.

  19. 0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER =  LOWER = 0 0

  20. 0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER =  LOWER = 2 0 2

  21. 0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER =  LOWER = 3 0 2 3

  22. 0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 4 LOWER = 4 0 2 3 4

  23. 0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 3 LOWER = 3 0 2 3 4 3

  24. 0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 3 LOWER = 6 0 2 3 6 4 3

  25. 0 0 7 2 7 2 3 8 6 10 3 6 4 3 8 7 12 10 9 10 4 3 Models of ComputationBranch & Bound UPPER = 3 LOWER = 7

  26. 0 7 2 3 6 4 3 Models of ComputationBranch & Bound • Tasks created dynamically • Upper bound is shared • To detect termination: scheduler detects tasks that have been: • Completed • Killed (“bounded”)

  27. API public class Host implements Runnable { . . . public void run() { while ( (node = jDM.getWork()) != null ) { if ( isAtomic() ) compute(); // search space; return result else { child = node.branch(); // put children in child array for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) jDM.addWork( child[i] ); //else child is killed implicitly } } }

  28. API private void compute() { . . . boolean newBest = false; while ( (node = stack.pop()) != null ) { if ( node.isComplete() ) if ( node.getCost() < UpperBound ) { newBest = true; UpperBound = node.getCost(); jDM.propagateValue( UpperBound ); best = Node( child[i] ); } else { child = node.branch(); for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) stack.push( child[i] ); //else child is killed implicitly } } if ( newBest ) jDM.returnResult( best ); } }

  29. Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound

  30. Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound

  31. Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound

  32. Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound

  33. Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound

  34. Scalable ComputationFault Tolerance via Eager Scheduling When: • All tasks have been assigned • Some results have not been reported • A host wants a new task Re-assign a task! • Eager scheduling tolerates faults & balances the load. • Computation completes, if at least 1 host communicates with client.

  35. 0 7 2 3 6 4 3 Scalable ComputationFault Tolerance via Eager Scheduling • Scheduler must know which: • Tasks have completed • Nodes have been killed • Performance  balance • Centralized schedule info • Decentralized computation

  36. Experimental Results

  37. 0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Experimental Results Example of a “bad” graph

  38. Conclusions • Javelin 2 relieves designer/programmer managing a set of [Inter-] networked processors that is: • Dynamic • Faulty • A wide set of applications is covered by: • Master-slave model • Branch & bound model • Weak shared memory performs well. • Use multicast (?) for: • Code distribution • Propagating values

  39. Future Work • Improve support for long-lived computation: • Do not require that the client run continuously. • A dag model of computation • with limited weak shared memory.

  40. Future WorkJini/JavaSpaces Technology “Continuously” disperse Tasks among brokers via a physics model H H H TaskManager aka Broker H H H H H

  41. Future WorkJini/JavaSpaces Technology • TaskManager uses persistent JavaSpace • Host management: trivial • Eager scheduling: simple • No single point of failure • Fat tree topology

  42. Future WorkAdvanced Issues • Privacy of data & algorithm • Algorithms • New computation-communication complexity model • N-body problem, … • Accounting: Associate specific work with specific host • Correctness • Compensation (how to quantify?) • Create open source organization • System infrastructure • Application codes

More Related