Lightweight Monitoring of the Progress of Remotely Executing Computations

Lightweight Monitoring of the Progress of Remotely Executing Computations Shuo Yang, Ali R. Butt Y. Charlie Hu, Samuel P. Midkiff Purdue University

Harvesting Unused Resources • Typical workloads are Bursty • Periods of little or no processing • Periods of insufficient CPU resources • Idle cycles not usable for future • Exploit values from the wasted idle resources • Achieve more available processing capability for “free” or at low cost • “Smooth out” the workload

The Need of Remote Monitoring • Centralized cycle sharing • SETI@Home, Genome@Home, IBM (with United Device), etc. • Condor, Microsoft (with GridIron), etc. • P2P based cycle-sharing (Butt et al. [VM’04]) • Individual node can utilize the system –more incentive • Nodes can be across administrative domains– more available resource • Remote execution motivates remote monitoring • Unreliable resources • Untrusted resources

Review of GridCop – [Yang et al. PPoPP’05] JVM JVM (Sandboxed) Submitted Job (H-code) progress Reporting Module Processing Module (S-code) partial computation Reporting Module Host Machine Submitter

Our New Contribution: Key Difference From GridCop • Uses probabilistic code instrumentation • Prevents replay attacks (like GridCop) • No recomputation needed – reduces network traffic and submitter machine overhead • Ties the progress information closely to program structure • Makes spoofing more difficult • PC values reflecting the program binary code internal nature

Outline • Overview • Design of Lightweight Monitoring Mechanism • Experimental Results • Related Research and Conclusions

System Overview: Code Generation Code Generation System Executed on Host: Emits progress information (“beacons”) during computation Original code Original code Host-code Submitter-code Executed on submitter: Processes “beacons”

System Overview Submitted Job (H-code) Beacon Processing Module (S-code) Reporting Module Beacon Host Machine Submitter

Basic Idea of the FSA Tracking • Beacons are placed at significant execution points along CFG • Beacons can be viewed as states in an FSA • Can be placed at any site satisfying the compiler instrumentation criteria, e.g. MPI call sites in this paper • Host emits beacon messages at significant execution points • An FSA emitting transition symbols • Submitter processes beacon messages • A mirror FSA recognizing legal transitions

An FSA Example main(){ … mpi_irecv(…); //S1 … if(predicate){ mpi_send(…); //S2 } … mpi_wait(); //S3 … } S1 S2 S3

Binary file Location Beacon (BLB) • BLB values are the virtual address of instructions in the virtual memory of a process– states in FSA Stack …. … 804a69b: call mpi_wait … 804a679: call mpi_send … 804a641: call mpi_irecv … …. heap bss Initialized data Code segment

PC values – labels driving the transitions in FSA main(){ … pc = getPC(); mpi_irecv(…);// 0x804a641 deposit_beacon(pc); if(predicate){ pc = getPC(); mpi_send(…); //0x804a679 deposit_beacon(pc); } pc = getPC(); mpi_wait(); //0x804a69b deposit_beacon(pc); … } • Compiler inserts a getPC() in front of a BLB • getPC() returns the address of the next instruction @804a679 “804a679” “804a69b” @804a641 @804a69b “804a69b” “804a641”

Tracking the Progress of an MPI Program main(){ … pc = getPC(); mpi_irecv(…);// 0x804a641 deposit_beacon(pc); if(predicate){ pc = getPC(); mpi_send(…); //0x804a679 deposit_beacon(pc); } pc = getPC(); mpi_wait(); //0x804a69b deposit_beacon(pc); … } @804a679 “804a679” “804a69b” @804a641 @804a69b “804a69b” “804a641”

Attacks to the FSA Mechanism • Susceptible to replay attack • Remember the stream of beacons of a previous run • Replay the stream in the future (cheating to gain undeserved compensation) • Reverse engineer the binary executable • Understand the control flow graph • Expensive– NP-hard in worst case ([Wang, PhD thesis University of Virginia])

Probabilistic BLB • Each MPI function call site is aBLB candidate but not necessarily a BLB site • It is used as a BLB site with probability of PB in (0,1) • Effect: an individual MPI function call site may be a BLB in the FSA in one code generation; but not a BLB in next time

Probabilistic BLBs Guard against Attack • The same job can have a different FSA each time it is submitted to the host • This leads to a different legal beacon value stream • Defeats the replay attack by making it detectable • Reverse engineering by binary analysis must be repeated by cheating host on each run • Break once, spoof only once Too expensive!

One FSA with Probabilistic BLB main(){ … pc = getPC(); mpi_irecv(…);// 0x804a641 deposit_beacon(pc); if(predicate){ pc = getPC(); mpi_send(…); //0x804a679 deposit_beacon(pc); } pc = getPC(); mpi_wait(); //0x804a69b deposit_beacon(pc); … } @804a679 “804a679” “804a69b” @804a641 @804a69b “804a69b” “804a641”

Another FSA with Probabilistic BLB main(){ … pc = getPC(); mpi_irecv(…);// 0x804a641 deposit_beacon(pc); if(predicate){ mpi_send(…); //0x804a679 } pc = getPC(); mpi_wait(); //0x804a69b deposit_beacon(pc); … } @804a641 @804a69b “804a69b” “804a641”

Experimental Setup • Submitter machine @UIUC (thanks to Josep Torrellas) • Intel 3GHz Xeon/512K cache, 1GB main memory • Running Linux 2.4.20 kernel • Host machine @Purdue • A cluster of 8 Pentium IV machines (each node has 512K cache, 512MB main memory), interconnected by a FastEthernet. • Running FreeBSD 4.7, MPICH 1.2.5 • Network access • Both machines connected to campus networks via Ethernet • UIUC—Purdue: representing a typical scenario of cycle-sharing across WAN

Benchmarks & Evaluation Metrics • Used NAS Parallel Benchmark (NPB) 3.2 • A set of benchmarks to evaluate the performance of parallel computational resources • Run Time Computation Overhead • Network Traffic Overhead • Network resource is not “free” • Beacon Distribution over Time • Capability to track progress incrementally

Host Side Computation Overhead at Different Number of Nodes • Overhead = (Tmonitoring – Toriginal) / Toriginal * 100% • Lower bar is better • Does not increase monotonically with the increase of process numbers

Host Side Computation Overhead under Different Input Sizes • Overhead = (Tmonitoring – Toriginal) / Toriginal * 100% • Lower bar is better • Lower overhead for larger problem size

Submitter Side Computation Cost • Overhead = time(submitter code) / execution time • Imperfect metric–the number depends on submitter’s hardware, submitter workload, host speed etc.

Network Traffic Incurred by Monitoring • Bytes sent over network between host and submitter machine divided by the total execution time • Low bandwidth usage

Beacon Distribution over Time Uniformly distribution enables incrementally tracking

Related Research • L. F. Sarmenta [CCGrid’01], W. Du et al. [ICDCS’04] • A host performs same computation on different inputs • Needs a central manager • Yang et al. [PPoPP’05] • Partially duplicate compuation • Incurs more network traffic associated with the recomputation • Hofmeyr et al. [J. of Computer Security’98], Chen and Wagner [CCS’02] • Using system call sequence to detect intrusions • Approaches to achieve host security

Conclusions • Lightweight monitoring over a WAN/Internet possible • No changes to host side system required • Instrumentation can be performed automatically

Host Side Overhead Details(Slide 22) Overhead = (Tmonitoring – Toriginal) / Toriginal • Does not increase monotonically with an increase in the number of processes (Nprocess) • When Nprocess increases: • The denominator, Toriginal,decreases • The numerator – difference of Tmonitoring and Toriginaldecreases (the number of MPI calls decreases, decreasing the overhead of BLB message generation) • Synchronization: always one extra thread per process no matter how many processes are running

Host Side Overhead Details(Slide 23) Overhead = (Tmonitoring – Toriginal) / Toriginal • Results in lower overhead for larger problem size • When the problem size increases • Denominator (Toriginal) increases • Numerator (Tmonitoring – Toriginal) similar since the number of MPI calls is similar

Lightweight Monitoring of the Progress of Remotely Executing Computations

Lightweight Monitoring of the Progress of Remotely Executing Computations

Presentation Transcript

Progress Monitoring

Progress Monitoring

Complexity of Computations

Monitoring the Progress of Special Education Students

Progress Monitoring

Progress monitoring

Monitoring of Child Progress

Progress Monitoring

Progress Monitoring

Progress Monitoring

Progress Monitoring

Progress Monitoring

Progress Monitoring

Monitoring Progress

jGMA: A lightweight implementation of the Grid Monitoring Architecture

Progress Monitoring

Monitoring of Child Progress

Indicators of progress and monitoring

Monitoring of Child Progress

Monitoring progress of the EUWI-EECCA component