Program Slicing by Mark Weiser 5 th International Conference on Software Engineering, San Diego, 1982

Program Slicingby Mark Weiser5th International Conference on Software Engineering, San Diego, 1982 • Program Slicing is a method that reduces a program to a portion that is of “interest.” The reasons behind the “interest” may be: • the desire to fix a bug • The desire to make a modification • So - - - - there is a desire to understand the behavior ofa portion of the program.

General Concepts • Program slicing may be viewed as another method to decompose a program, by some criteria. • The criteria is defined through specifying a set of program variable (s) at some set of statement (s) • Usually it is one statement and one variable of interest. • Criterion = < i, v> , where i = statement number, v = variable • The set of all the statements that would “influence” the statement and the variable specified in the criterion <i, v> is considered the program slice.

Some Examples from Weiser If Criterion = <12, {z} > 1. Begin 2. Read (x, y) 5. if (x <=1) 6. then Sum = y 7. else Begin 8. Read (z) 12. End If Criterion = <12, {z} > 1. Begin 8. Read (z) 12. End or • Original Program • Begin • Read (x,y) • Total = 0.0 • Sum = 0.0 • If (x <= 1) • then Sum = y • else Begin • Read (z) • Total = x * y • End • Write (Total, Sum) • End If Criterion = <9, {x} > 1. Begin 2. Read (x,y) 12. End Why not? If Criterion = <12, {Total} > 1. Begin 2. Read (x,y) 3. Total = 0.0 5. If (x<= 1) 6. then Sum =y 7. else Begin 9. Total = x * y 12 End

Desirable Properties of a Program Slice • The slice must be obtained from the original program via deletion of statements To make sure that this works, we need to delete in such a manner that the result after the deletion is still “meaningful’. That is, no statement increases the number of its immediate successors as a result of a statement deletion. (So be careful of the “branch” statement which has multiple successor paths.)

Desirable Properties of a Program Slice 2. The behavior of the slice must be the same as the original program as observed through the slicing criterion This is a reasonable requirement except, we need to relax it by saying for all “terminating program.” The example of non-terminating program example says that we can’t ensure the 2nd property for the program below if x = 0. 1. Begin 2. Read (x) 3. If x =0 4. then Begin (some infinite loop without changing x) 5. x = 1 6. end 7. else x =2 8. end The problem is statement 5, which may never be executed if x =0. So if criterion = < 8, {x} > The slice must include statement 5, which may never execute and we can’t compare the slice and the original program, if they do not terminate.

Another Theoretical Boundary • Because we have the special case of not being able to compare two programs to be “equal,” we can not find the “Smallest” slice and compare to see if it behaves the same as another slice. • So, finding a “minimal” slice is a problem .

Guide to more Formal Definitions usingDataflow algorithm • A Clarification first: The dataflow algorithm alluded to here is not the DFD that some of you may be thinking. Dataflow algorithm here refers to the path from a variable usage to that variable’s source of definition. • Let C = criterion = <i,v> • Let R[0,C] (n) = the variables in statement n that can directly affect what is expressed as of interest through C = < i, v >. We will be using these variables thatdirectly affect C as the guide to coming up with the statements that we want to include in the slice.

Detailed Definition of R[0,C] (n) • This is a recursive definition. Given C = < i, V > • R[0,C] (n) = all variables v such that either: • n = i and v is in V OR • n is an immediate predecessor of a statement m such that either: a. V is in REF (n) and there is a variable w in both DEF (n) and R[0, C] (m) OR b. v is not in DEF (n) and v is in R[0,C] (m) where DEF (n) = set of variables whose values are altered in statement n. REF (n) = set of variables who are referenced in statement n.

Simple Example of R[0,C] (n) Trace through the R[0,C]’s 1. R[0,C] (6) = {z} because of definition part 1) 2. R[0,C] (5) = {y, x} because of definition 2 a) 3. R[0,C] (4) = {y, x} because of definition 2 b) 4. R[0,C] (3) = (y, x} because of definition 2 b) 5. R[0,C] (2) = { } (this is like a procedural call to Read subroutine 6. R[0,C] (1) = { } So for C = <6, {z} > , R[0,C] = { z, y, x } Sample program: 1. Begin 2. Read (x,y) 3. If ( x > y ) 4. then z = x – y 5. else z = y – x 6. Write (z) And let the criterion C be C = < 6, {z} >

Getting to the Slice Statementsfrom R[0,C] (n) • Given the Criterion, <i, V> and the resulting R[0,C], the statements that should be included in the Slice are those that change any of the variables in R[0,C]. • More formally, let S[0,C] be the set of statements that should be included in the Slice based on R[0,C]: S[0,C] = every statement, x, such that {DEF (x)} ∩ {R[0,C] } ≠ Ø

Continuing the Example to get the Slice: Sample program: 1. Begin 2. Read (x,y) 3. If ( x > y ) 4. then z = x – y 5. else z = y – x 6. Write (z) And let the criterion C be C = < 6, {z} > Given: for C = <6, {z} > , R[0,C] = { z, y, x } • S[0,C] = every statement, x, such that • {DEF (x)} ∩ {R[0,C] } ≠ Ø • Statement 2 : {x, y} ∩ {z, y, x } ≠ Ø • Statement 4 : {z} ∩ {z, y, x } ≠ Ø • Statement 5 ; {z} ∩ {z, y, x } ≠ Ø So, S[0,C] = {statements 2, 4, 5}, and they should be included in the Slice

Any Other Statements? • Intuitively, the branch statement in our example, statement 3, which decided which path to take seem to “influence” the C= < 6, {z} >. Should we not include the branching statement in the Slice? B • First a little more definition: • A) In a program graph, a statement x • is said to dominate statement y • if x is in every path from the Begin • statement to y. • B) An inverse dominator, D(n), of a • statement n is a statement x • that is on every path from n to the • end of the program. R P S X Y A) X is a dominator of Y B) X and Y are inverse dominators of B

Indirect Influence • It would seem that if the branch statement influences which path to take and in one of the paths there is a statement in S[0,C], then that branch statement should be included in the Slice as an “indirect” influencer. • More formally: Let ND(Branch) be the set of statements which are on a path from the Branch statement to its nearest inverse dominator, x, excluding Branch and x themselves. Then the Branch statement has indirect influence if S[0,C] ∩ ND(Branch) ≠ Ø

Continuing the Example to Include the “Indirect” Influencer Sample program: 1. Begin 2. Read (x,y) 3. If ( x > y ) 4. then z = x – y 5. else z = y – x 6. Write (z) And let the criterion C be C = < 6, {z} > x > y ? z = y - x z = x - y Write (z) • Statement 3 is the Branch statement. • Statement 6 is the inverse dominator of statement 3. • Statements 4 and 5 are in ND(Branch) • Also recall that S[0,C] = { statements 2, 4, 5 } • (statements 4 and 5} ∩ {statements 2,4,5} = {statements 4,5} • Thus ND (Branch) ∩ S[0.C] ≠ Ø Thus we should include statement 3 in the Slice.

A Little more Definitions & Concepts • Let CS(S[0,C]) = set of statements that include thebranches that will influence the statements in S[0,C]. • Let B[0,C] = those branch statements that are in CS(S[0,C]). • Now, what about those statements that might • influence the branch statements, B[0,C] ?

Include the Influencers of B[0,C] • Consider a slicing criteria at a branch statement to be <b, REF(b) >, where b is the branch statement and REF(b) is the set of variables that influence the choice of path from b. • Let BC(b) denote this branch criterion, <b, REF(b)> • Then the set of variables that directly influence the branch statement, b, can be denoted as: R [ 0, BC(b) ] (n) • The next level of variables of influence (which will include the direct and the indirect), R[1,C] (n) would be defined as: R[1,C] (n) = {R[0,C] (n)} U { R[0,BC(b)] (n), for all b in B[0,C] } • The next level of statements of influence would include all those statements that directly and indirectly influence the criterion: S[1,C] = {statements x : where DEF(x) ∩ R[1,C](n) ≠ Ø or x is in B[0,C] }

Generalizing this Recursive Definition for Slicing • Because the program can have multiple levels of branches and paths, to come up with a Program Slice based on a criterion C, one may have to recursively process the direct and the indirect influencing variables and the associated statements as follows: • R[i+1, C] (n) = {R[i,C] (n)} U { R[0,BC(b)](n) such that b is in B[i,C] } • B[i+1, C] = CS( [ S [i +1, C] ) • S[i+1, C] = { statement x: {DEF(x)} ∩ {R[i+1,C](n) ≠ Ø} } or x is in B[i, C] }

Experimentation with Slicing • People and Experiment: • 21 experienced programmers • 3 programs, each with a bug, were given to be debugged • Then program slices were shown • 2 Local adjacent code slice • 3 Non-adjacent code slice (including slices relevant / not relevant to debugging) • Result: • “Remembered” the 2 adjacent code slices more than the non-adjacent code slices, except for the non-adjacent code slice that contained the bug. • For the non-adjacent code slice (but relevant to the bug), the number of “remembered” situations were as much as the adjacent code slices. • Also some automatic slicers were built - - - with some success

Some Metrics based on Program Slicing • Slice Coverage = mean slice lengths / program length. (understand the mean length of slices and how they compare with the whole program, with the suspicion that the smaller this metric is the more “independent” parts this program contains - - - thus it is not very cohesive?) • Overlap = number of statements found uniquely in that slice versus those that are not unique. (The smaller the number is, the more interdependencies among the code.) • Clustering = number of statements that are adjacent in the slice to the total slice statements. ( a low number may mean the code is spread out and intertwined like a spaghetti code.)

Some Metrics based on Program Slicing • Parallelism = number of slices which have a small number of statements in common. (if the common statements among the slices is zero and there are many of these, then the slices may be executed in parallel to gain computation time?) • Tightness = number of those statements that are in every slice. (If this number is high among the subroutines, then perhaps the subroutines should be combines into one?) In the spirit of REFACTORING, can these metrics be used as guidelines to improve design and code?

Program Slicing by Mark Weiser 5 th International Conference on Software Engineering, San Diego, 1982