240 likes | 377 Views
Cilk NOW. Based on a paper by Robert D. Blumofe & Philip A. Lisiecki. Organization. Introduction The Cilk language & workstealing scheduler Cilk-NOW job architecture Adaptive parallelism Fault tolerance Cilk-NOW macro-scheduling Conclusion. Introduction: Cilk-NOW features. Ease of use
E N D
Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki
Organization • Introduction • The Cilk language & workstealing scheduler • Cilk-NOW job architecture • Adaptive parallelism • Fault tolerance • Cilk-NOW macro-scheduling • Conclusion
Introduction: Cilk-NOW features • Ease of use Standard command line interface for running Cilk-NOW programs. • Adaptive parallelism Joining & retreating is oblivious to users. • Fault tolerance Cilk programs oblivious to: • Check-pointing • Failure detection & recovery
Introduction: Cilk-NOW features … • Flexibility • Sovereignty of workstation’s owner is preserved: Owner defines “idle”. • Security • Customary Unix user security. Users must have Unix login on system. • Guaranteed performance Uses Cilk’s thread scheduler: • Work-stealing • Provably efficient predictable performance.
Introduction: Cilk-NOW features … • No distributed shared memory • No fault tolerance for I/O • All workstations share a file system. • Work focuses on: • Adaptive parallelism • Fault tolerance
Organization • Introduction • The Cilk language & work-stealing scheduler • Cilk-NOW job architecture • Adaptive parallelism • Fault tolerance • Cilk-NOW macro-scheduling • Conclusion
Cilk language & work stealing scheduler • This is the same as Cilk. • The standard Fibonacci example follows.
Compute the nth Fibonacci Number thread fib ( cont int k, int n ) { if ( n < 2 ) send_argument ( k, n ); else { cont int x, y; spawn_next sum ( k, ?x, ?y ); spawn fib ( x, n – 1 ); spawn fib ( y, n – 2 ); } } thread sum ( cont int k, int x, int y ) { send_argument ( k, x + y ); }
Organization • Introduction • The Cilk language & workstealing scheduler • Cilk-NOW job architecture • Adaptive parallelism • Fault tolerance • Cilk-NOW macro-scheduling • Conclusion
Cilk-NOW job architecture • A Cilk-NOW job consists of: • A clearinghouse process • 1 or more worker processes • Begin a job by typing the command CilkChouse -- pfold 3 7 This starts a worker that: Forks a clearinghouse process that • Sends the job description to the macro-scheduler • Waits for messages from its workers.
(b) An idle machine joins the job • Another machine’s node manager goes “idle” • It sends a job request to the macro-scheduler • The macro-scheduler returns the pfold job • The node manager forks a new worker with no associated clearinghouse • The worker registers with the pfold clearinghouse • The clearinghouse gives the worker: • Its name (worker names are integers, starting from 0) • A list of other workers on this job • The worker steals a closure from a worker.
(c) A no-longer idle machine retreats • The machine’s owner touches the keyboard • Node manager sends kill signal to its worker • Worker catches signal: • Offloads closures to other workers • Un-registers from clearinghouse • Terminates
Maintaining the work lists • Each worker checks in with clearinghouse every 2 seconds. If a worker’s “lease” expires ( no check in for 30 sec.) then the clearinghouse removes it from its list • Clearinghouse returns a list of revisions: • workers to add & delete from local list.
UDP • UDP between: • Workers • Clearinghouse & worker • Faster than TCP for the common case. • No pretense of reliability when none exists.
Organization • Introduction • The Cilk language & workstealing scheduler • Cilk-NOW job architecture • Adaptive parallelism • Fault tolerance • Cilk-NOW macro-scheduling • Conclusion
Adaptive parallelism • What happens when a waiting closure gets offloaded to another worker? • How do send_argument invocations get their info to the moved waiting closure? • The paper describes a notion of sub-computation, and uses this notion to handle this situation. • To be continued …
A simple way ? • Have the waiting closure’s unfilled arguments refer to the continuations that refer to them. • When the waiting closure is offloaded to a new worker, the waiting closure informs its continuations of its new address. • For this to work, when a continuation is passed to another closure, the waiting closure is informed • This may be a lot of work. • To be continued …
Organization • Introduction • The Cilk language & workstealing scheduler • Cilk-NOW job architecture • Adaptive parallelism • Fault tolerance • Cilk-NOW macro-scheduling • Conclusion
Fault tolerance • To be continued, based on a fuller understanding of closure migration under worker retreat.
Organization • Introduction • The Cilk language & workstealing scheduler • Cilk-NOW job architecture • Adaptive parallelism • Fault tolerance • Cilk-NOW macro-scheduling • Conclusion
Organization • Introduction • The Cilk language & workstealing scheduler • Cilk-NOW job architecture • Adaptive parallelism • Fault tolerance • Cilk-NOW macro-scheduling • Conclusion