280 likes | 431 Views
VOLPEX. BOINC Workshop 10. Enabling interprocess communication for BOINC applications. Hien Nguyen, Eshwar Rohit University of Houston Supervisors: Dr. Jaspal Subhlok University of Houston Dr. David P. Anderson SSL – U.C, Berkeley. RESEARCH GOAL.
E N D
VOLPEX BOINC Workshop 10 Enabling interprocess communication for BOINC applications Hien Nguyen, Eshwar Rohit University of Houston Supervisors: Dr. Jaspal Subhlok University of Houston Dr. David P. Anderson SSL – U.C, Berkeley
RESEARCH GOAL • Enable BOINC to efficiently support apps that require interprocess communication. • Goals: • Easier programming for communicating applications • Reduce execution time (not increase throughput) 2 Hien Nguyen University of Houston
Example Applications • REMD Protein Folding application • Each process runs a standard molecular simulation at different temperature P1 P2 P3 P4 P5 P6 P7 P8 270 280 290 300 310 320 330 340 STEP-1 280 270 290 300 320 310 340 330 STEP-2 280 290 270 300 320 340 310 330 STEP-3 3 Hien Nguyen University of Houston
Example Applications • Or many other applications: • Differential equation solvers (grid) (synchronous) • Game playing with alpha/beta pruning (asynchronous) • Search application. • ….. • Suitable applications: moderate amount of communication. 4 Hien Nguyen University of Houston
DIFFICULTIES Job execution Fast host X X Slow host Synchronization point X Slow down overall execution speed Worse as number of host increases X 5 Hien Nguyen University of Houston
OUTLINE • Volpex Dataspace Overview • IPC for volunteer environment • Integration With BOINC • Process management • Host selection • Future and Related Work 6 Hien Nguyen University of Houston
Volpex Dataspace • Dataspace: global shared space that processes can use for information exchange without a temporal or spatial coupling. Volpex Dataspace Server Put(ABC, 800) Get(ABC,?) (800) 7 Hien Nguyen University of Houston
Volpex Dataspace – Fault Tolerance replicated X Put(ABC, 800) Get(ABC,?) (800) Volpex DSS is designed to support redundant Put/Get operations unlike Linda & variants 8 Hien Nguyen University of Houston
Volpex Dataspace • Why centralized? Scale issues • But: firewalls, no incoming connections 9 Hien Nguyen University of Houston
INTEGRATION WITH BOINC • Mechanics: Process management • Simultaneous process starting • Fault tolerance • Checkpoint/restart • Policy: Host selection. 10 Hien Nguyen University of Houston
Job execution scheme BOINC Scheduler X Get work Get checkpoint Put data item Volpex Dataspace Server Put checkpoint Get data item 11 Hien Nguyen University of Houston
PROCESSES MANAGEMENT • Simultaneous process starting: • All processes start computation together: reduce wasted resources because processes will have to wait for eachothers. • Volpex jobs have highest (infinite) priority: uninterruptible by other jobs. • While waiting for all processes of a Volpex job to be ready: host can do other finite priority volunteer jobs. • Use of boinc_temporary_exit() 12 Hien Nguyen University of Houston
PROCESSES MANAGEMENT • Fault tolerance • Dead instance spotted by heartbeat mechanism: process instances regularly send heartbeat to Volpex DSS. • BOINC scheduler recruits a new volunteer host to replace the dead one. 13 Hien Nguyen University of Houston
PROCESSES MANAGEMENT • Hot spare policy: BOINC Scheduler Fast replacement X Hot spare group Volpex Dataspace Server 14 Hien Nguyen University of Houston
PROCESSES MANAGEMENT • Checkpointing: • Process instance commits and uploads checkpoints to Volpex DSS (only stores latest checkpoint for each process). • Restarted process instance requests checkpoint from Volpex DSS. 15 Hien Nguyen University of Houston
HOST SELECTION • Volpex job: consists of processes that form a job, submitted by scientist. • Has requirements on: • Deadline • Number of processes • Has estimates of: • Total flops per process. • Flops between 2 consecutive checkpoints. • Memory usage, disk usage 16 Hien Nguyen University of Houston
HOST SELECTION POLICY • Criteria for selecting volunteer hosts to assign to a Volpex job: Speed and Availability. • Availability: the interval that a host is available w/o interruption (BOINC client allowed to compute). 17 Hien Nguyen University of Houston
HOST SELECTION POLICY • Job’s minimum requirements: • Minimum speed : Fast enough to meet job’s deadline • Min speed = Total flops / Deadline • Minimum expected availability : host is continuously available for x hours to commit at least 1 checkpoint. 18 Hien Nguyen University of Houston
Evaluate Host's Availability • We want to predict host's length of availability interval. • Method based on : Exploiting Non-Dedicated Resources for Cloud Computing Artur Andrzejak, Derrick Kondo, David P. Anderson. (NOMS10) 19 Hien Nguyen University of Houston
Evaluate Host's Availability • Last value predictor: simplistic predictor which uses the availability value in the last hourly interval before prediction as the prediction of availability for the next x hours interval. • Combined with ranking hosts by predictability: number availability changes per week. 20 Hien Nguyen University of Houston
Evaluate Host's Availability In essence: select hosts which change availability very rarely. A process assigned to a host with high predictability does not necessarily need to be replicated. 21 Hien Nguyen University of Houston
IMPLEMENTATION STATUS • Volpex utilities: for scientists to submit, abort or query status of a Volpex job. • Modified BOINC scheduler: includes host selection for Volpex job. • Modified Volpex DSS: handling new type of requests. 22 Hien Nguyen University of Houston
IMPLEMENTATION STATUS BOINC Server BOINC Scheduler submit job specs Scheduler request dynamically create result from WU Create job & WU Scientist X Scheduler reply replacement heartbeat get procID Database checkpoint Volpex Dataspace Server Hot spare group request procID procID of failed instance 23 Hien Nguyen University of Houston
FUTURE WORK • Experiment and evaluate with different degrees of freedom: • Number of processes 10-1M • Communication pattern (local/global, synch/asynch) • Size and frequency of communication • Goal: study (via live experiment or simulation) the performance of Volpex/BOINC over this space • Application: Eratosthenes, REMD Protein Folding • Enhance host selection policy 24 Hien Nguyen University of Houston
OTHER WORK • Volpex MPI: • An MPI library designed for executing parallel applications in volunteer environment. • Direct communication between processes. • Key Features • Controlled redundancy • Receiver based direct communication • Distributed sender based logging • More detail: “VolpexMPI: an MPI Library for Execution of Parallel Applications on Volatile Nodes” by Troy LeBlanc, Rakhi Anand, Edgar Gabriel, and Jaspal Subhlok. 25 Hien Nguyen University of Houston
If you have application? • We would be happy to cooperate with you. • Our team contacts: • Dr. Jaspal Subhlok: jaspal@uh.edu • Dr. David Anderson: davea@ssl.berkeley.edu • Dr. Edgar Gabriel: gabriel@cs.uh.edu • Hien Nguyen: hien.nguyen.nx@gmail.com • Eshwar Rohit: eshwar.rohit@gmail.com • Rakhi Anand: rakhi@cs.uh.edu • Our Website: http://www2.cs.uh.edu/~jsteach/volpex/index.htm 26 Hien Nguyen University of Houston
VOLPEX Thanks for listening!