Dependence-Cache Slicing: A Slicing Method Using Lightweight Dynamic Information

Dependence-Cache Slicing: A Slicing Method Using Lightweight Dynamic Information Tomonori Takada, Fumiaki Ohata, Katsuro Inoue Osaka University

Background of Research • Software Systems are becoming large and complex • Debugging, testing, and maintaining costs are increasing • To reduce development costs, techniques for improving efficiency of such activities are essential

Comprehension • Comprehending large source programs is difficult • If we could select specific portions in the source programs and we can concentrate our attentions only to those portions, the performance of the activities would increase

Program Slicing • A technique of extracting all program statements affecting the value of a variable • Slicing: Extraction • Slice: Collection of extracted statements • Developers can concentrate their attentions to the extracted statements

Experiment Using Program Slice • To evaluate the validity of slice • With two independent groups • Measured bug detection time • with slice: 122 minutes • without slice: 165 minutes Kusumoto, S., Nishimatsu, A., Nishie, K. and Inoue, K. : ``Experimental Evaluation of Program Slicing for Fault Localization'', Empirical Software Engineering, Vol.7, No.1, pp. 49-76 (2002).

Static Slicing • Program is analysed statically (without execution) • All possible input data sets are assumed. • Extract all possible statements affecting the value of the focused statement. • Program Dependence Graph (PDG)is used. • Static slices are extracted by traversing edges in PDG.

Program Dependence Graph PDG shows dependence relations between statements in a source program. • nodes • statements • conditional predicates • edges • control dependence edges • data dependence edges

s3: a := 1; s4: writeln(a); a s3 s4 Dependences • Control Dependence (CD) • Statement s1 has a control dependence to statement s2 if the execution of s2 is decided by s1’s result. • Data Dependence (DD) • Def-Use relation. s1: if a=0 then s2: b :=1; s1 s2

Example of PDG program test(input, output); var a : array [0..9] of integer; var b, i, c : integer; begin writeln("input array : "); for i:=0 to 9 do a[i] := i * i; writeln("input number : "); readln(b); if b < 10 then c := a[b] else c := -1; writeln(c) end. writeln(“inp.. for i:=0 to 9 i a[i] := i * i writeln(“inp.. readln(b) b b a[] if b<10 c := a[b] c c := -1 c writeln(c) DD CD

writeln(c) Slicing Criteria writeln(c) Example of static slice for i:=0 to 9 do a[i] := i * i; readln(b); if b < 10 then c := a[b] else c := -1; program test(input, output); var a : array [0..9] of integer; var b, i, c : integer; begin writeln("input array : "); for i:=0 to 9 do a[i] := i * i; writeln("input number : "); readln(b); if b < 10 then c := a[b] else c := -1; writeln(c) end. writeln(“inp.. for i:=0 to 9 i a[i] := i * i writeln(“inp.. readln(b) b b a[] if b<10 c := a[b] c c := -1 c writeln(c)

Dynamic Slicing • Program is analysed dynamically (executed with a particular input data) • Extract statements actually affecting the value of a slicing criteria • Execution trace is recorded • Dynamic Dependence Graph(DDG) is constructed from the exection trace. • Dynamic slices are extracted by traversing edges in DDG.

writeln(“inp.. for i:=0 to 9 for a[0] := 0*0 a[5] := 5 * 5 i a[1] := 1 * 1 for for a[6] := 6 * 6 a[2] := 2 * 2 for for a[7] := 7 * 7 for a[3] := 3 * 3 a[8] := 8 * 8 for b a[4] := 4 * 4 for for a[9] := 9 * 9 a[] readln(b) b writeln(“inp.. b a[5] if b<10 writeln(c) c := a[5] c writeln(c) writeln(c) Example of dynamic slice writeln("input array : "); for i:=0 to 9 do a[0] := 0 *0; for i:=0 to 9 do a[1] := 1 *1; for i:=0 to 9 do a[2] :=2 *2; for i:=0 to 9 do a[3] := 3 *3; for i:=0 to 9 do a[4] := 4 *4; for i:=0 to 9 do a[5] := 5 *5; for i:=0 to 9 do a[6] := 6 *6; for i:=0 to 9 do a[7] := 7 *7; for i:=0 to 9 do a[8] := 8 *8; for i:=0 to 9 do a[9] := 9 *9; writeln("input number : "); readln(b); if b < 10 then c := a[b] writeln(c) program test(input, output); var a : array [0..9] of integer; var b, i, c : integer; begin writeln("input array : "); for i:=0 to 9 do a[i] := i * i; writeln("input number : "); readln(b); if b < 10 then c := a[b] else c := -1; writeln(c) end. for i:=0 to 9 do a[i] := i * i; readln(b); if b < 10 then c := a[b] input b=5

Static and Dynamic Slicing • Analysis cost: static < dynamic • Recording execution trace is exhaustive • Determining data dependence and cotrol dependence on execution trace is expensive • Slice size: static > dynamic • Static slicing considers all possible flows • Dynamic slicing only considers one trace unify Efficient and Effective Slicing

Unified Slicing Methods • Focusing on Dynamic Control-Flow Information • Hybird Slicing (Gupta, 1997) • Collect all traces between break points and procedure calls • Need to specify break points / Trace can be huge • Call-Mark Slicing (Nishimatsu, 1999; our group) • Dynamically set call-marks (flags that shows a caller statement is executed or not) • Eliminate non-executed statements from PDG by using call-mark and execution dependence relations. • Focusing on Dynamic Data-Flow Information • Reduced DDG Method (Agrawal, 1990) • The same sub-structure of DDG is shared with one structure. • Run-time overhead is serious. • Dependence-Cache Slicing

Dependence-Cache Slicing Dependence-Cache Slicing (DC slicing) : A slicing method focused on dynamic data-flow information • Control Dependence • Easily obtained by syntax analysis • Data Dependence • static analysis is difficult Computation Step1: Pre-Execution Analysis Statically compute control dependence relations and construct PDG having control dependence edges and nodes Computation Step2: Execution-time Analysis Collect dynamic data dependence relations by using Caches and add data dependence edges to PDG

s1 s2 s1 s2 s3 c b s4 s2 s3 a[0] s4 s2 s3 s5 Data Dependence Collection Value of cache Input: b=0 a[0] a[1] b c s1: a[0]:=0; s2: a[1]:=3; s3: readln(b); s4: a[b]:=2; s5: c:=a[0]+4; s6: writeln(c); s1 Each cache holds the statement where the variable is defined

writeln(“inp.. for i:=0 to 9 i a[i] := i * i writeln(“inp.. readln(b) b b a[5] if b<10 c := a[b] c c := -1 writeln(c) writeln(c) writeln(c) Example of DC slice program test(input, output); var a : array [0..9] of integer; var b, i, c : integer; begin writeln("input array : "); for i:=0 to 9 do a[i] := i * i; writeln("input number : "); readln(b); if b < 10 then c := a[b] else c := -1; writeln(c) end. for i:=0 to 9 do a[i] := i * i; readln(b); if b < 10 then c := a[b] input b=5

Experiment • Measured some metric values on our slicing system “Osaka Slicing System (OSS)” • OSS had already implemented features to extract static, call-mark and dynamic slices. • Add function to compute DC slice • Three sample PASCAL programs • P1: calendar program (85 lines) • P2：wholesaler program (387 lines) • P3：wholesaler program2 (871 lines) • Slicing criterion were randomly chosen

Slice Size lines 187 static 182 200 166 call-mark 162 180 dependence-cache 160 140 dynamic 120 100 80 61 60 40 21 17 16 15 8 5 5 20 0 P1 P2 P3 static ＞ call-mark ＞＞ DC ＞ dynamic DC and dynamic slicing can analyze actual dependence. P2, P3 use array variables.

Pre-Execution Analysis Time time (ms) 800 710 698 static 700 call-mark 600 dependence-cache 500 dynamic 400 300 213 215 200 48 100 19 14 11 5 N/A N/A N/A 0 P1 P2 P3 static ≒ call-mark ＞ DC DCslicing analyses only control dependence relations.

Execution time 206464 time (ms) 6000 static 4731 4834 5000 4700 call-mark 4540 dependence-cache 4000 dynamic 3000 2000 1000 174 51 47 47 43 43 45 0 P1 P2 P3 Execution time for static slicing shows the execution time for “original” program. Static ≒ CM ≒ DC ＜＜ Dynamic DC slicing can be computed with small overhead increase.

Slice Computation Time 76 101 24969 time (ms) 10 8 static 6 call-mark 4 dependence- 3.0 3.0 cache 1.9 1.8 dynamic 2 1.2 0.7 0.6 0.4 0.3 0 P1 P2 P3 DC ＜ static ≒ call-mark ＜＜ dynamic DC slicing uses PDG that has less DD edges than that of static slicing.

Discussion • Analysis cost: static £ DC << dynamic • Collect dynamic data dependence relatios by simple method • Slice size: static ³ DC ³ dynamic • only “actual” data dependence relations are added to PDG • Reasonable slice results with reasonable analysis time • Promising approach to get effective program localization

Limit of DC slicing • DC slice’s accuracy is less than dynamic slice’s. DC slicing analyse dependence relations between statements, not between execution trace. For this program, DC slicing can’t distinct between first and second execution of s5. (Dynamic slicing can distinct it.) s1: a[0] := 0; s2: a[1] := 1; s3: i:= 0; s4: while i<2 do begin s5: b := a[i]; s6: i := i + 1 end; s7: writeln(b);

Applications • We have applied DC slicing to several language environments. • Pascal (Interpreter) • OSS mentioned before. • Java source code (Preprocessor) • Translate program to collect dynamic data dependence relations. • Java byte code (Compiler, VM) • Virtual Machine collects dynamic data dependence Relations • Most of Java libraries are provided by byte code

Conclusions and Future Works • Proposed dependence-cache slicing • Practical and efficient approach to get reasonable slices • Confirmed validity through an experiment • Applicable to various environments Future Works • Evaluation through user testing • Apply to other language environments

Dependence-Cache Slicing: A Slicing Method Using Lightweight Dynamic Information