Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta and Youtao Zhang

Precise Dynamic Slicing AlgorithmsXiangyu Zhang, Rajiv Gupta and Youtao Zhang Presented By: Krishna Balasubramanian

SlicingTechniques? • Static Slicing • Isolates all possible statements computing a particular variable • Criteria: <v, n> • Dynamic Slicing • Isolates unique statements computing variable for given inputs • Criteria: <i, v, n>

Example – Data dependences • Static Slice <10, z> = {1, 2, 3, 4, 7, 8, 9, 10} • Dynamic Slice <input, variable, execution point> • <N=1, z, 101> = {3, 4, 9, 10}

Slice Sizes: Static vs Dynamic • Static slicing gives huge slices • On an average, static slices much larger

Precise Dynamic Slicing • Data dependences exercised during program execution captured precisely and saved • Only dependences occurring in a specific execution of program are considered • Dynamic slices constructed upon users requests by traversing captured dynamic dependence information • Limitation : Costly to compute

Imprecise Dynamic Slicing Reduces cost of slicing Found to greatly increase slice sizes Reduces effectiveness Worthwhile to use precise algorithms

Precise vs Imprecise: Slice Size • Implemented two imprecise algorithms: Algorithm I and Algorithm II • Imprecise increases the Slice Size • Algorithm II better than Algorithm I

Precise Dynamic Slicing - Approach • Program executed • Execution trace collected • PDS involves: • Preprocessing: • Builds dependence graph by recovering dynamic dependences from program’s execution trace • Slicing • Computes slices for given slicing requests by traversing dynamic dependence graph

3 Algorithms Proposed • Full preprocessing (FP) – Builds entire dependence graph before slicing • No preprocessing (NP) • No preprocessing performed • Does demand driven analysis during slicing • Caches the recovered dependencies • Limited preprocessing (LP) • Adds summary info to execution trace • Uses demand driven analysis to recover dynamic dependences from compacted execution trace What do you think is better and why?

Comparison • FP algorithm impractical for real programs • Runs out of memory during preprocessing phase • Dynamic dependence graphs extremely large • NP algorithm does not run out of memory but is slow • LP algorithm is practical • Never runs out of memory • Fast

1)Full Preprocessing • Edges corresponding to data dependences extracted from execution trace • Added to statement level control flow graph • Execution instances labeled on graph • Uses instance labels during slicing • Only relevant edges traversed

FP - Example • Load to store edge on left labeled (1,1) • Load to store edge on right labeled (2,1) • 1st/2nd instance of load’s execution gets value from 1st instance of execution of store on the left/right • When load included in dynamic slice, not necessary to include both stores in dynamic slice. Instance Labels

FP - Example

FP - Example • Dynamic data dependence edges shown • Edges labeled with execution instances of statements involved in data dependences • Data dependence edges traversed during slice computation of Z used in the only execution of statement 16 is: (161, 143), (143, 132), (132, 122), (132, 153), (153, 31), (153, 152), (152, 31), (152, 151), (151, 31), (151, 41) • Precise dynamic slice computed is: DS<x=6, z, 161> = {16,14,13,12,4,15,3} • Compute the slice corresponding to the value of x used during the first execution of statement 15 ?? • DS <x=6, x, 151> = Slice {4,15}

2) No Preprocessing • Demand driven analysis to recover dynamic dependences • Requires less storage compared to FP • Takes more time • Caching used to avoid repetitive computations • Cost of maintaining cache vs repeated recovery of same dependences from trace

NP Example • No dynamic data dependence edges present initially • To compute slice for z at only execution of st 16: • single backward traversal of trace (161, 143), (143, 132), (132, 122), (132, 153), (153, 31), (153, 152), (152, 31), (152, 151), (151, 31), (151, 41) extracted

NP with Cache • Data dependence edges added to program flow graph • Compute slice for use of x in 3rd instance of st 14 • All dependences required already present in graph • Trace not reexamined • Compute slice for use of x by 2nd instance of st 10 • Trace traversed again • Additional dynamic data dependences extracted

3) Limited Preprocessing • LP strikes a balance b/w preprocessing & slicing costs • Limited preprocessing of trace • Augments trace with summary information • Faster traversal of augmented trace • Demand driven analysis to compute slice using augmented trace • Addresses • Space problems of FP • Time problems of NP

LP – Approach • Trace divided into trace blocks • Each trace block of fixed size • Store summary of all downward exposed definitions of variable names & memory addresses • Look for variable definition in summary of downward exposed definitions • If definition found, traverse trace block to locate it • Else, use size information to skip to start of trace block

Evaluation • Execution traces on 3 different input sets for each benchmark computed • Computed 25 different slices for each execution trace • Slices computed wrt end of program’s execution (@ End) • Computed 25 slices at an additional point in program’s execution (@ midpoint) for 1st input

Results – Slice sizes PDS Sizes for additional Input • PDS sizes for 2nd & 3rd program inputs for @ End are shown • No. of statements in dynamic slice is small fraction of statements executed • Different inputs give similar observations • Thus, Dynamic slicing is effective across different inputs

Evaluation - Slice computation times • Compared FP, NPwoC, NPwC, and LP • Cumulative execution time in seconds as slices are computed one by one is shown • Graphs include both preprocessing times & slice computation times

Execution Times X-Axis: Slices Y-Axis: Cum Exec time(s)

Observations • FP rarely runs to completion • Mostly runs out of memory • NPwoC, NPwC and LP successful • Makes computation of PDS feasible • NPwoC shows linear increase in cumulative exec time with no. of slices • LP cumulative exec time rises much more slowly than NPwoC and NPwC

Observations • Exec times of LP are 1.13 to 3.43 times < than NP • Due to % of trace blocks skipped by LP • Shows that limited preprocessing does pay off Cumulative times: NP vs LP Trace Blocks skipped by LP

LP (Precise) vs Algorithm II (Imprecise) • Slice Sizes: • Slices computed by LP 1.2 to 17.33 times smaller than imprecise data slices of Algorithm II • Relative performance was similar • Execution Times: • @ End, total time taken by LP 0.55 to 2.02 times Algorithm II • @ Midpoint, total time taken by LP is 0.51 to 1.86 times Algorithm II

Results • Both have no memory problems • Smaller slice sizes for LP • For large slices, execution time greater than imprecise • For small slices, execution time less than imprecise

Summary • Precise LP algorithm performs the best • Imprecise dynamic slicing algorithms are too imprecise, hence not an attractive option • LP algorithm is practical • Provides Precise Dynamic Slices at reasonable space and time costs

Thank you!

Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta and Youtao Zhang