Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay

Guoqing Xu, Atanas Rountev, Yan Tang, Feng Qin Ohio State University ESEC/FSE 07 Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay

Outline Motivation Challenges for checkpointing/replaying Java software Summary of our approach Contributions Static analyses Multiple execution regions Experimental evaluation Conclusions

Motivation Checkpointing/replaying has been used for a variety of purposes at system level Originally designed to support fault tolerance Debugging of OS and of parallel and distributed software Checkpointing can benefit a number of software engineering tasks Reduce the cost of manual debugging and testing Support for automated techniques for debugging and testing: e.g., dynamic slicing and delta-debugging Inspired by both system-level checkpointing [Pan-PDD88, Dunlap-OSDI02, King-USENIX05] and “saving-and-restoring” software engineering techniques [Saff-ASE05, Orso-WODA05,Orso-WODA06, Elbaum-FSE06]

Challenges Ease of use and deployment Application-level checkpointing: no JVM/runtime support, just code analysis and instrumentation Challenge: no direct access to the call stack; no control over thread scheduling or external resources (files, etc.) Reduce the size of the recorded state Dumping the entire heap may be prohibitively expensive, especially for large programs Challenge: static analyses to prune redundant state Static and dynamic overhead Static analysis cost is amortized over multiple runs Approach is intended for long-running applications

Summary of Our Approach Tool input: program + checkpoint definition Performs static analyses and code instrumentation Tool output: two program versions First, an augmentedcheckpointing version is executed once to record (parts of) the run-time program states At the checkpoint: heap objects, static fields, locals At certain points along the call chain leading to the checkpoint Next, a prunedreplaying version is executed multiple times Restore variables saved at the checkpoint Restore variables saved at points along the call chain How do we resume execution from the checkpoint? Step 1: control flow quickly reaches the checkpoint Step 2: recover state at checkpoint Step 3: incrementally recover state after call sites along the call chain leading to the checkpoint

Definitions Crosscut call chain (CC-chain) A programmer-specified call chain that leads to the method that contains the checkpoint E.g. main(44) -> run(28) Decision points A call site on the CC-chain (e.g. m.run) – due to polymorphism A predicate on which a decision point or the checkpoint is control-dependent At a decision point, the checkpointing version records the control-flow outcome The replaying version uses this info to force the control flow to reach the checkpoint

Replaying, Step 1: Recover the Call Stack Predicate decision point: recover boolean value Call site decision point o.m(a1…, an) Recover the run-time type of the receiver object; instantiated during replaying using sun.misc.Unsafe

Checkpointing Version void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Set body_packs = getBpacks(); boolean b = Options.v().whole_jimple(); =>save(b); if (b){// DP getPack("cg").apply(); // --- checkpoint --- => save(…); getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); … } ... } static void main(String[] args) { Main m = new Main(); boolean b = args.length !=0; =>save(b); if (b) // DP => save(type_of(m)); m.run(args); // DP }

Replaying Version void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Set body_packs = getBpacks(); boolean b = Options.v().whole_jimple(); =>read(b); if (b){// DP getPack("cg").apply(); // --- checkpoint --- =>read(…); getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); … } static void main(String[] args) { Main m = new Main(); boolean b = args.length !=0; =>read(b); if (b) // DP => read(type_of(m)); => unsafe.allocate(m); => args = null; m.run(args); // DP }

Step 2: Recover at the Checkpoint Our static analysis selects locals for recording(for checkpointing)/recovering(for replaying) when They are written before the checkpoint They are read after the checkpoint Record primitive-typed values or entire object graphs on the heap (all reachable objects) Static fields are selected based on the same idea void run(String[] args) { processCmdLine(args); loadNecessaryClasses(); Set wp_packs = getWpacks(); Setbody_packs= getBpacks(); if (Options.v().whole_jimple()) { getPack("cg").apply(); // --- checkpoint --- getPack("wjtp").apply(); getPack("wjop").apply(); getPack("wjap").apply(); } retrieveAllBodies(); for (Iterator i = body_packs.iterator(); i.hasNext();) { … }… } body_packs

Selection of Static Fields A whole program Mod/Use analysis A static field is “written” if its value is changed, or any heap object reachable from it is mutated A static field is “read” if its value is directly read Analysis algorithm Context-sensitive and flow-insensitive; uses the points-to solution and the call graph from Spark [Lhotak CC-03] Bottom-up traversal of the SCC-DAG of the call graph For each method m, a set Cmis maintained to contain all objects from which a mutated object can be reached Propagate backwards the objects in Cmthat escape a callee method to its callers Select a static field fld if PointsToSet(fld)∩ Cm ≠ ∅

Step 3: Recover after the Checkpoint Replaying only at decision points and the checkpoint is not enough to guarantee correct execution after the checkpoint Additionally record/recover local variables that will be read after each call site in CC-chain void main(){ Set hs = new HashSet(); B b = new B(hs); //-- reco/rest // (type_of(b)) b.m(); //-- extra reco/rest (hs) if(hs == b.s){ … } } class B{ Set s; void m(){ B r0 = this; r0.s = new HashSet(); //-- checkpoint //-- reco/rest (r0) r0.s.add(“”); } } hsuninitialized

Additional Issues A checkpoint can have multiple run-time instances If a method in CC-chain has callers that are not in the chain, it has to be replicated Currently do not support multi-threaded programs Our technique does not guarantee the correctness of the execution, when the post-checkpoint part of the program Depends on external resources, such as files, databases Depends on unique-per-execution values, such as clock Is modified with new cross-checkpoint dependencies Multiple execution regions Designated by a starting point and an ending point Specified by two CC-chains

Study 1: Static Analysis Program #R #IP compress 1 6 socksproxy 3 11 socksecho 3 14 raytrace 3 10 soot-2.2.3 10 35 muffine 3 20 sablecc 4 11 jess 3 8 violet 4 9 javacup 4 9 jtar-1.21 2 4 db 2 5 jflex 2 8 jb-6.1 3 5 jlex-1.2.6 3 8

Static Analysis: Locals Reduction

Static Analysis: Static Fields Reduction

Static Analysis: Removed/Inserted Statements

Static Analysis Cost Phase 1: Soot infrastructure cost Between 1.64ms and 30.6ms per thousand Jimple statements On average, 11.1ms/1000 statements Phase 2: Our analysis cost Between 1.67ms and 26.6ms per thousand Jimple statements On average, 9.4ms/1000 statements This should be amortized across multiple runs of the replaying version

Study 2: Run-Time Performance (compress) Original program: compressing and decompressing 5 big tar files several times Evaluated for five checkpoint definitions One checkpoint, close to the beginning of the program Two regions of compression and decompression A region containing the process of compression A region containing the process of decompression One checkpoint, close to the end of the program

compress Performance • Normalized running times • Normalized size of captured program state

Study 2: Run-Time Performance (soot) Input: soot-2.2.3 itself containing 2227333 methods Phases Enabling cg.spark, wjtp, wjop.ji, wjap.uft,jtp, jop.cp Evaluated for six checkpoint definitions Before whole-program packs After cg After wjtp After wjop After wjap After body packs

soot Performance • Normalized running times • Normalized captured program state

Study 2: Run-Time Performance (jflex-1.4.1) Input: a .flex grammar file corresponding to a DFA containing 21769 states Evaluated for four checkpoint definitions After NFA is generated After DFA is generated to DFA After minimization After emission

jflex Performance • Normalized running time • Normalized size of capture state

Summary of Evaluation Static analysis successfully reduces the size of program state recorded and recovered It is more meaningful to checkpoint/replay long-running programs Checkpoints are better taken after a phase of long time computation with (relatively) small output state √compress: small program state, short running time √ soot: large program state, but very long computation time X jflex: large program state, short running time

Conclusions A static-analysis-based checkpointing/replaying technique An implementation and an evaluation that shows our technique can be an interesting candidate for testing, debugging, and dynamic slicing of long-running programs Future work Language-level checkpointing/replaying multi-threaded programs More precise static analyses could be employed to reduce the size of program state to be captured The run-time support for object reading and writing could be improved

Questions?

compress Run #Objects Space %Heap Timec(s) (%wio) Timer(s) (%rio) 1 31 471by 0.17% 4.19 (0.74%) 4.14 (0.38%) 2 545 89.7M 28.8% 5.22 (10.4%) 3.19 (11.8%) 3 22 89.7by 28.9% 5.38 (9.0%) 2.17 (12.8%) 4 578 89M 26.7% 4.70 (12.3%) 1.39 (24.7%) 5 31 296by 0.008% 4.17 (8.1%) 47 (34.0%) Original running time: 4.05s

soot Run #Objects Space %Heap Timec(s) (%wio) Timer(s) (%rio) 1 461058 36.2M 36.3% 4695.3 (0.4%) 4643.5 (0.5%) 2 65648481 745M 73.2% 4712.2 (7.2%) 4410.5 (9.1%) 3 65648481 745M 73.2% 4688.4 (6.9%) 4387.3 (8.7%) 4 77739391 806.4M 79.0% 4770.1 (8.0%) 511.5 (95.2%) 5 77767256 806.5M 63.5% 4972.8 (8.0%) 533.1 (97.8%) 6 75668735 795.3M 72.8% 4661.6 (8.0%) 411.5 (96.5%) Original running time: 4665.7s

jflex Run #Objects Space %Heap Timec(s) (%wio) Timer(s) (%rio) 1 6606489 259.8M 86.1% 64.9 (8.0%) 68.8 (18.3%) 2 6695173 385.1M 68.1% 65.2 (12.3%) 55.6 (26.1%) 3 6695172 385.1M 68.1% 63.9 (12.1%) 55.4 (26.0%) 4 21 2K 0.0003% 56.2 (0.14%) 0.063 (50.8%) Original running time: 52.6s

Efficient Checkpointing of Java Software using Context-Sensitive Capture and Replay