400 likes | 563 Views
Co-Slicing for Program Comprehension and Reuse March 2008. Ran Ettinger Software Asset Management Group In Advanced SW Tools Seminar, TAU. Agenda. Introduction to Program Slicing Debugging aid, program comprehension tool, and much more PAINLESS demo Co-Slicing for Reuse: Program Sliding
E N D
Co-Slicing for Program Comprehension and ReuseMarch 2008 Ran Ettinger Software Asset Management Group In Advanced SW Tools Seminar, TAU
Agenda • Introduction to Program Slicing • Debugging aid, program comprehension tool, and much more • PAINLESS demo • Co-Slicing for Reuse: Program Sliding • A provably-correct code-motion untangling transformation of slice extraction • Co-Slicing for Program Comprehension • Problem of large slices • Novel solution: Interactive program exploration with the slice-inclusion relation and co-slicing • A Co-Slicing Algorithm • Back to Sliding • Related Work • CodeSurfer’s single-step slice browsing, thin slicing, Dijkstra’s projections, and method-extraction algorithms • Further Challenges
Introduction to Program Slicing • Slicing is the study of meaningful subprograms • “When debugging unfamiliar programs programmers use program pieces called slices which are sets of statements related by their flow of data. The statements in a slice are not necessarily textually contiguous, but may be scattered through a program” [Mark Weiser,CACM82 ] • Given a program and a variable (at a point) of interest, a slice of the program on that variable is a subprogram that preserves the original behavior, with respect to that variable • Demo 1: Slicing HLASM code (Program Analysis INfrastructure for Legacy Enterprise Software Systems project [PAINLESS] at IBM HRL) • A wide variety of potential applications • Debugging, program comprehension, testing, refactoring, componentization, parallelization, and more
Co-Slicing for Reuse: Program SlidingExtract the computation of profit i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; while (i < days) { totalSale = totalSale+sale[i]; i = i+1; } profit = 0.9*totalSale-cost; } if (shouldProcess) { i = 0; totalPay = 0; while (i < days) { totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; } i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; while (i < days) { totalSale = totalSale+sale[i]; i = i+1; } profit = 0.9*totalSale-cost; } The extracted slice if (shouldProcess) { i = 0; totalPay = 0; while (i < days) { totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; } The complement (no unnecessary duplication of the loop for reading sales; hence, no need to reject!) Example source: Lakhotia and Deprez [IST98]
Co-Slicing for Reuse: Program Sliding • A provably-correct code-motion untangling transformation of slice extraction • My doctoral thesis “Refactoring via Program Slicing and Sliding” [Ett] • Automated slice extraction • Combines statement reordering with code duplication • A sequential composition of a slice with its complement (i.e. co-slice) • Adding some compensatory code, for correctness • Enables automation of advanced refactorings • Split/Merge Loops • Separate Query from Modifier • Command/Query separation • Arbitrary Method Extraction • Advanced versions of Extract Method • Sliding thesis [Ett] and Raghavan Komondoor’s thesis [Kom] • Replace Temp with Query • By slice extraction • Demo 2: Nate, an Eclipse plugin, prototype slice-extraction refactoring for a small subset of Java, developed by Mathieu Verbaere and myself at Oxford, 2003/2004, supported by an Eclipse Innovation Grant by IBM
Co-Slicing for Program Comprehension: Problem of Large Slices • Slices (especially from the end) tend to grow too large to be effective • Why is the typical end-slice so large? • The slice must produce correct values • It hence includes all statements that may contribute to the value of any used variable, at any point in the slice (i.e., it follows all data-flow dependences) • Demo 3: indirect (i.e. base register) data dependence • The slice must be executable • It hence includes all statements with conditions and jumps, if those control whether to execute (or not) any other statement in the slice • Demo 4: following control dependences
Problem of Large Slices: End-Slice of Variable pay i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay totalPay sale i cin
Novel Solution: A Slice-Inclusion Relation and Co-Slicing • Interactive program exploration • Guided by a slice-inclusion diagram • First introduced by Gallagher and Lyle [ToSE91] • For software maintenance, supporting a process of change, avoiding the need for regression testing • The diagram is a directed graph, representing a given program (or subprogram) S, and including: • A node for each (defined) variable x • Stands for the slice of S on x, from the end • A directed edge from x to y whenever • The slice of x is fully included in that of y, and • There is no other variable z whose slice both includes the slice of x and is included in the slice of y
Example: Slice-Inclusion Diagram i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Interactive Program Exploration:An On-Demand Approach i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
On-Demand Exploration i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
On-Demand Exploration i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
On-Demand Exploration i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
On-Demand Exploration: Back to End-Slice of profit…the problem of large slices is not yet solved i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Interactive Program Exploration withSlices and Co-Slices: A Bottom-Up Approach i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Bottom-Up Exploration with Slices and Co-Slices i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Bottom-Up Exploration with Slices and Co-Slices i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Bottom-Up Exploration with Slices and Co-Slices i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Bottom-Up Exploration with Slices and Co-Slices i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Bottom-Up Exploration with Slices and Co-Slices i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Bottom-Up Exploration with Slices and Co-Slices i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
What’s in a Co-Slice then?Is it the complementary set of statements? [too small!] i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
What’s in a Co-Slice then?Is it the union of slices of all remaining variables? [too large!] i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
What’s in a Co-Slice then?Assume results of selected variables are available and reuse them i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Illustration of a Co-Slicing algorithm (1):Assume results of selected variables are available and reuse them i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+fSale[i]; totalPay = totalPay+0.1*fSale[i]; if (fSale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = fTotalPay/days+100; profit = 0.9*fTotalSale-cost; } pay profit totalPay totalSale sale i cin
Illustration of a Co-Slicing algorithm (2):Slice now… and get a smaller slice i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+fSale[i]; totalPay = totalPay+0.1*fSale[i]; if (fSale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = fTotalPay/days+100; profit = 0.9*fTotalSale-cost; } pay profit totalPay totalSale sale i cin
Illustration of a Co-Slicing algorithm (3):Undo the variable renaming… wherever possible i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+fSale[i]; totalPay = totalPay+0.1*fSale[i]; if (fSale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = fTotalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Back to Sliding: Separate non-maximal from maximal i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; totalPay = 0; while (i < days) { totalSale = totalSale+sale[i]; totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } } if (shouldProcess) { pay = totalPay/days+100; profit = 0.9*totalSale-cost; } pay profit totalPay totalSale sale i cin
Another Sliding Example: Separate profit and all included variables i = 0; while (i<days) cin >> sale[i++]; if (shouldProcess) { i = 0; totalSale = 0; while (i < days) { totalSale = totalSale+sale[i]; i = i+1; } profit = 0.9*totalSale-cost; } if (shouldProcess) { i = 0; totalPay = 0; while (i < days) { totalPay = totalPay+0.1*sale[i]; if (sale[i]>1000) totalPay = totalPay+50; i = i+1; } pay = totalPay/days+100; } pay profit totalPay totalSale sale i cin
The Promise of Sliding • Mainly good for: • Enhancing reusability of tangled (non-contiguous) existing code • Refactoring (e.g. Replace Temp with Query) • Componentization • Parallelization • Particularly strong in: • Correctness, i.e. behavior preservation • Maximizing reuse (of extracted computation’s results, in the complement) • Minimizing code duplication, i.e. yielding a small complement • Minimizing the necessary compensation, i.e. less backup variables • Improving applicability, i.e. less reasons to reject a request
Related Work: Enhance Reuse by Method Extraction • Slice extraction • Tucking by Lakhotia and Deprez [IST98] • Complement is union of slices from all non-extracted points • No data flow from slice to complement • Block-based slicing by Maruyama [SSR01] • A rudimentary approach (no proof of correctness) • Untangling: A Slice Extraction Refactoring [AOSD04] • Arbitrary method extraction: Extract any selection of (not-necessarily contiguous) code • Tucking [IST98] • Procedure extraction by Komondoor and Horwitz [POPL00,IWPC03,Kom] • Allows data flow from extracted code to the complement • Inspired invention of co-slices • However, does not support duplication of assignments • Hence, no untangling of loops; instead, may extract more code than actually selected
Related Work: Program Comprehension • Thin slicing (by Sridharan, Fink and Bodik [PLDI07]) • Focus on direct (value, not pointer) data dependences • Ignore control dependences • Ignore data dependences carrying pointers (base and index registers, in the context of HLASM) • A thin slice can be expanded with other thin slices • Yielding the full traditional slice, in the limit • CodeSurfer’s slice browsing [CodeSurfer] • One step at a time, jumping forward or backward in a slice, following data or control dependences • Dijkstra’s projections (in his Smoothsort article [SoCP82]) • Explaining an algorithm stepwise, one variable/projection at a time
Some Further Challenges • Implement the co-slicing and sliding algorithms • Extend to “real” languages • Collect empirical results on length and usefulness of co-slices • Extend the slice-inclusion diagram to slices from internal program points • Apply sliding to more refactorings (e.g. “Separate Query from Modifier” [Fow], arbitrary method extraction) • Apply the sliding-related refactorings in bigger reengineering challenges (e.g. Convert Procedural Design to Objects [Fow], componentization, conversion to SOA) • Sliding beyond refactoring (e.g. in optimizing compilers, code obfuscation)
References • [CACM82] Programmers use slices when debugging, M. Weiser, 1982 • [SoCP82] Smoothsort, an Alternative for Sorting In Situ, E. W. Dijkstra, 1982 • [ToSE91] Using Program Slicing in Software Maintenance, Gallagher and Lyle, 1991 • [IST98] Restructuring programs by tucking statements into functions, A. Lakhotia and J.-C. Deprez, 1998 • [FOW] Refactoring: Improving the Design of Existing Code, M. Fowler, 2000 • [POPL00] Semantics-preserving procedure extraction, R. Komondoor and S. Horwitz, 2000 • [SSR01] Automated method-extraction refactoring by using block-based slicing, K. Maruyama, 2001 • [IWPC03] Effective automatic procedure extraction, R. Komondoor and S. Horwitz, 2003 • [Kom] Automated Duplicated-Code Detection and Procedure Extraction, R. Komondoor, PhD thesis, University of Wisconsin-Madison, 2003 • [AOSD04] Untangling: a slice extraction refactoring, R. Ettinger and M. Verbaere, 2004 • [Ett] Refactoring via Program Slicing and Sliding, R. Ettinger, DPhil thesis, 2006 • http://progtools.comlab.ox.ac.uk/members/rani/sliding_thesis.pdf • [PLDI07] Thin Slicing, M. Sridharan, S. J. Fink, R. Bodik, 2007 • [CodeSurfer] CodeSurfer from GrammaTech • http://www.grammatech.com/products/codesurfer/ • [PAINLESS] The Program Analysis INfrastructure for Legacy Enterprise Software Systems project • http://www.haifa.il.ibm.com/projects/services/painless
A Definition of (Slices and) Co-Slices • Definition of a slice: • Let S be a given statement and let V be a set of variables of interest. • A statement S’ is a slice of S on V, if for any input on which S terminates, S’ will terminate too, and with the same result held in all program variables V. • In a similar manner, the novel concept of a co-slice can be defined as follows: • Let S be a given statement and let V be a set of variables of NO interest. • That is, the final value of each variable in V, and the code for computing it, in S, can be removed -- if not contributing to any other result. • A statement S’ is a co-slice of S on V, if for any input on which S terminates, S’ will terminate too, and with the same result held in all program variables outsideV. • Moreover, suppose the result of V is available for reuse through the corresponding set of fresh variables fV. A co-slice S’ on V with fV is free to use the final value of variables in V through the corresponding elements of fV (or even directly from V, if only such final value references are present in S’).
A Co-Slicing Algorithm: Rationale • The goal of the algorithm is to maximize reuse of the available final values before slicing for the complementary set of variables • However, a simplistic approach of substituting all uses of co-sliced variables will not do • Some of the uses make reference to intermediate (i.e. non-final) values • The final-value references must be identified and substituted, ahead of slicing • Finally, after slicing, some substitutions may be undone
Final-Use Substitution • A final use of a variable x is a reference to x in a program point p, in which x is guaranteed to hold its final value • That is, no path from p to the exit, in terms of control flow, includes a definition of x • Or, equivalently, an assertion of the form assert x == fx, where fx is a fresh variable, can be correctly propagated backwards (against the flow of control) from the exit to the program point p • A definition of final-use substitution: • Given a program statement S, a set of variables X, and a corresponding set of fresh variables fX, the final-use substitution of X with fX, on statement S, yields a new statement S’ by replacing all final-use references of each member of X with a reference to the corresponding member of fX
A Co-Slicing Algorithm • Given a statement S, a set of variables of no-interest V, and a corresponding set of final values fV, compute the co-slice of S on V with fV as follows: • Reuse fV wherever possible • Let S’ be S with final-use substitution of V by fV • Slice for all remaining variables • Determine the complementary set of variables, coV, as all possibly-modified variables in S that are not in V • Let S’’ be the slice of S’ on coV • Undo the earlier substitutions wherever possible • Let V1 be the set of all variables in V that are not referenced (i.e. neither used nor defined) in S’’ • Let fV1 be the subset of fV corresponding to the subset V1 of V • Let S’’’ be the statement S’’ after normal substitution of all variables fV1 with the corresponding program variables V1 • Return S’’’ as the co-slice of S on V with fV