310 likes | 404 Views
Lightweight Monitoring of the Progress of Remotely Executing Computations. Shuo Yang, Ali R. Butt Y. Charlie Hu, Samuel P. Midkiff Purdue University. H arvesting U nused R esources. Typical workloads are Bursty Periods of little or no processing Periods of insufficient CPU resources
E N D
Lightweight Monitoring of the Progress of Remotely Executing Computations Shuo Yang, Ali R. Butt Y. Charlie Hu, Samuel P. Midkiff Purdue University
Harvesting Unused Resources • Typical workloads are Bursty • Periods of little or no processing • Periods of insufficient CPU resources • Idle cycles not usable for future • Exploit values from the wasted idle resources • Achieve more available processing capability for “free” or at low cost • “Smooth out” the workload
The Need of Remote Monitoring • Centralized cycle sharing • SETI@Home, Genome@Home, IBM (with United Device), etc. • Condor, Microsoft (with GridIron), etc. • P2P based cycle-sharing (Butt et al. [VM’04]) • Individual node can utilize the system –more incentive • Nodes can be across administrative domains– more available resource • Remote execution motivates remote monitoring • Unreliable resources • Untrusted resources
Review of GridCop – [Yang et al. PPoPP’05] JVM JVM (Sandboxed) Submitted Job (H-code) progress Reporting Module Processing Module (S-code) partial computation Reporting Module Host Machine Submitter
Our New Contribution: Key Difference From GridCop • Uses probabilistic code instrumentation • Prevents replay attacks (like GridCop) • No recomputation needed – reduces network traffic and submitter machine overhead • Ties the progress information closely to program structure • Makes spoofing more difficult • PC values reflecting the program binary code internal nature
Outline • Overview • Design of Lightweight Monitoring Mechanism • Experimental Results • Related Research and Conclusions
System Overview: Code Generation Code Generation System Executed on Host: Emits progress information (“beacons”) during computation Original code Original code Host-code Submitter-code Executed on submitter: Processes “beacons”
System Overview Submitted Job (H-code) Beacon Processing Module (S-code) Reporting Module Beacon Host Machine Submitter
Basic Idea of the FSA Tracking • Beacons are placed at significant execution points along CFG • Beacons can be viewed as states in an FSA • Can be placed at any site satisfying the compiler instrumentation criteria, e.g. MPI call sites in this paper • Host emits beacon messages at significant execution points • An FSA emitting transition symbols • Submitter processes beacon messages • A mirror FSA recognizing legal transitions
An FSA Example main(){ … mpi_irecv(…); //S1 … if(predicate){ mpi_send(…); //S2 } … mpi_wait(); //S3 … } S1 S2 S3
Binary file Location Beacon (BLB) • BLB values are the virtual address of instructions in the virtual memory of a process– states in FSA Stack …. … 804a69b: call mpi_wait … 804a679: call mpi_send … 804a641: call mpi_irecv … …. heap bss Initialized data Code segment
PC values – labels driving the transitions in FSA main(){ … pc = getPC(); mpi_irecv(…);// 0x804a641 deposit_beacon(pc); if(predicate){ pc = getPC(); mpi_send(…); //0x804a679 deposit_beacon(pc); } pc = getPC(); mpi_wait(); //0x804a69b deposit_beacon(pc); … } • Compiler inserts a getPC() in front of a BLB • getPC() returns the address of the next instruction @804a679 “804a679” “804a69b” @804a641 @804a69b “804a69b” “804a641”
Tracking the Progress of an MPI Program main(){ … pc = getPC(); mpi_irecv(…);// 0x804a641 deposit_beacon(pc); if(predicate){ pc = getPC(); mpi_send(…); //0x804a679 deposit_beacon(pc); } pc = getPC(); mpi_wait(); //0x804a69b deposit_beacon(pc); … } @804a679 “804a679” “804a69b” @804a641 @804a69b “804a69b” “804a641”
Attacks to the FSA Mechanism • Susceptible to replay attack • Remember the stream of beacons of a previous run • Replay the stream in the future (cheating to gain undeserved compensation) • Reverse engineer the binary executable • Understand the control flow graph • Expensive– NP-hard in worst case ([Wang, PhD thesis University of Virginia])
Probabilistic BLB • Each MPI function call site is aBLB candidate but not necessarily a BLB site • It is used as a BLB site with probability of PB in (0,1) • Effect: an individual MPI function call site may be a BLB in the FSA in one code generation; but not a BLB in next time
Probabilistic BLBs Guard against Attack • The same job can have a different FSA each time it is submitted to the host • This leads to a different legal beacon value stream • Defeats the replay attack by making it detectable • Reverse engineering by binary analysis must be repeated by cheating host on each run • Break once, spoof only once Too expensive!
One FSA with Probabilistic BLB main(){ … pc = getPC(); mpi_irecv(…);// 0x804a641 deposit_beacon(pc); if(predicate){ pc = getPC(); mpi_send(…); //0x804a679 deposit_beacon(pc); } pc = getPC(); mpi_wait(); //0x804a69b deposit_beacon(pc); … } @804a679 “804a679” “804a69b” @804a641 @804a69b “804a69b” “804a641”
Another FSA with Probabilistic BLB main(){ … pc = getPC(); mpi_irecv(…);// 0x804a641 deposit_beacon(pc); if(predicate){ mpi_send(…); //0x804a679 } pc = getPC(); mpi_wait(); //0x804a69b deposit_beacon(pc); … } @804a641 @804a69b “804a69b” “804a641”
Outline • Overview • Design of Lightweight Monitoring Mechanism • Experimental Results • Related Research and Conclusions
Experimental Setup • Submitter machine @UIUC (thanks to Josep Torrellas) • Intel 3GHz Xeon/512K cache, 1GB main memory • Running Linux 2.4.20 kernel • Host machine @Purdue • A cluster of 8 Pentium IV machines (each node has 512K cache, 512MB main memory), interconnected by a FastEthernet. • Running FreeBSD 4.7, MPICH 1.2.5 • Network access • Both machines connected to campus networks via Ethernet • UIUC—Purdue: representing a typical scenario of cycle-sharing across WAN
Benchmarks & Evaluation Metrics • Used NAS Parallel Benchmark (NPB) 3.2 • A set of benchmarks to evaluate the performance of parallel computational resources • Run Time Computation Overhead • Network Traffic Overhead • Network resource is not “free” • Beacon Distribution over Time • Capability to track progress incrementally
Host Side Computation Overhead at Different Number of Nodes • Overhead = (Tmonitoring – Toriginal) / Toriginal * 100% • Lower bar is better • Does not increase monotonically with the increase of process numbers
Host Side Computation Overhead under Different Input Sizes • Overhead = (Tmonitoring – Toriginal) / Toriginal * 100% • Lower bar is better • Lower overhead for larger problem size
Submitter Side Computation Cost • Overhead = time(submitter code) / execution time • Imperfect metric–the number depends on submitter’s hardware, submitter workload, host speed etc.
Network Traffic Incurred by Monitoring • Bytes sent over network between host and submitter machine divided by the total execution time • Low bandwidth usage
Beacon Distribution over Time Uniformly distribution enables incrementally tracking
Outline • Overview • Design of Lightweight Monitoring Mechanism • Experimental Results • Related Research and Conclusions
Related Research • L. F. Sarmenta [CCGrid’01], W. Du et al. [ICDCS’04] • A host performs same computation on different inputs • Needs a central manager • Yang et al. [PPoPP’05] • Partially duplicate compuation • Incurs more network traffic associated with the recomputation • Hofmeyr et al. [J. of Computer Security’98], Chen and Wagner [CCS’02] • Using system call sequence to detect intrusions • Approaches to achieve host security
Conclusions • Lightweight monitoring over a WAN/Internet possible • No changes to host side system required • Instrumentation can be performed automatically
Host Side Overhead Details(Slide 22) Overhead = (Tmonitoring – Toriginal) / Toriginal • Does not increase monotonically with an increase in the number of processes (Nprocess) • When Nprocess increases: • The denominator, Toriginal,decreases • The numerator – difference of Tmonitoring and Toriginaldecreases (the number of MPI calls decreases, decreasing the overhead of BLB message generation) • Synchronization: always one extra thread per process no matter how many processes are running
Host Side Overhead Details(Slide 23) Overhead = (Tmonitoring – Toriginal) / Toriginal • Results in lower overhead for larger problem size • When the problem size increases • Denominator (Toriginal) increases • Numerator (Tmonitoring – Toriginal) similar since the number of MPI calls is similar