250 likes | 390 Views
On Refactoring Support Based on Code Clone Dependency Relation. Norihiro Yoshida 1 , Yoshiki Higo 1 , Toshihiro Kamiya 2 , Shinji Kusumoto 1 , Katsuro Inoue 1 1 Osaka University 2 National Institute of Advanced Industrial Science and Technology. Background(1) What is a code clone?.
E N D
On Refactoring Support Based on Code Clone Dependency Relation Norihiro Yoshida1, Yoshiki Higo1, Toshihiro Kamiya2, Shinji Kusumoto1, Katsuro Inoue1 1Osaka University 2National Institute of Advanced Industrial Science and Technology METRICS 2005
Background(1) What is a code clone? • A set of code fragments identical or similar to each other • Introduced in source program by various reasons such as reusing code by `copy-and-paste’ • Make software maintenance more difficult Code Clone METRICS 2005
Background(2) Refactoring on Code Clones • Refactoring[1] is a way to deal with code clone problem. • Refactoring is a technique for restructuring an existing code • Alter software’s internal structure without changing its external behavior [2] • Improve the maintainability of software • Number one in the stink parade is duplicate code [1] New method Call statements [1] M. Fowler, Refactoring: improving the design of existing code, Addison Wesley, 1999. [2] http://www.refactoring.com METRICS 2005
Merging (Refactoring) Method a Merging (Refactoring) Method b Merging (Refactoring) Method c Motivation • There are dependency relations between methods belonging to the different code clone. Method a1 Method a2 Code clone A Method b1 Method b2 Code clone B Method c1 Method c2 Code clone C METRICS 2005
Motivation • There are dependency relations between methods belonging to the different code clone. Merging all of the code clones at once is more effective Method a Method a1 Method a2 Merging Method b1 Method b2 Method b Method c1 Method c2 Method c METRICS 2005
Research Goal • Define a set of code clones having dependency relations as a chained clone • Propose a effective refactoring support method for the chained clone Chained Clone Method a1 Method a2 Method b1 Method b2 Method c1 Method c2 METRICS 2005
Definition of chained clone(1) • Chained Method • A set of methods that hold dependency relations • Chained Method Graph • A node represents a method • An edge represents a dependency relation • Three types of labels for the dependency relation • “Call” : Calling methods • “Ai” : Sharing variable i in terms of assignment • “Rj” : Sharing variable j in terms of reference A Chained Method Call Call Rx Ax A Chained Method Graph METRICS 2005
Definition of chained clone(2) • Chained Clone • For 2 given chained methods CM1 and CM2, we transform them into chained method graphs G1 and G2. • For G1 and G2, if the following three conditions are satisfied, we call the pair of CM1 and CM2 as a chained clone. • G1 and G2 are isomorphic. • Each pair of the corresponding nodes between G1 and G2, holds a clone relation. • In G1 and G2, labels of the corresponding edge are identical. • Chained Clone Set • Anequivalence class of chained clones CM1 CM2 G2 G1 Call Call Call Call Rx Ax Rx Ax A pair of nodes filled with colored same color is a code clone METRICS 2005
Applicable Refactorings for Chained Clones • The following refactoring[1] can be applied to merge chained clones. • Pull Up Method Refactoring • Extract Method Refactoring • Extract Super Class Refactoring • Accordingto the characteristics of a chained clone, we provide a different appropriaterefactoring forit. [1] M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999. METRICS 2005
All the methods in a chained clone that are contained in a single class. Typical Chained Clones Case 1 : Extract Method Refactoring After Refactoring Before Refactoring Class A Class A Chained Clone Method a12 Method a11 Method a1 Method a21 Method a22 Method a2 All methods can be merged into two new methods in the class A. (“Extract Method” Refactoring) METRICS 2005
Typical Chained Clones Case 2 : Pull Up Method Refactoring • All methods in a chained clone belong to classes that have common parent classes. • All methods of each chained method are in the same class respectively. Before Refactoring After Refactoring Super Class Super Class Method 1 Class A Class B Chained Clone Method 2 Method a1 Method b1 Method b2 Method a2 Class B Class A All methods of each code clone can be merged into a new method in the parent class. (“Pull Up Method” Refactoring) METRICS 2005
Typical Chained Clones Case 3 : Extract SuperClass Refactoring • Some methods in a chained clone belong to classes that have no common parent class. • All method of each chained method arein the same class respectively. Before Refactoring After Refactoring New SuperClass Class B Class A Method 1 Chained Clone Method a1 Method b1 Method 2 Method a2 Method b2 Class B Class A All methods of each code clone can be merged into a new method in the new superclass. (”Extract SuperClass” Refactoring) METRICS 2005
Typical Chained Clones Case 4( difficult to apply refactoring ) • Chained methods exist in different classes. Class S2 Class S1 Chained Clone Class B Class D Class A Class C Method b Method a Method c Method d It is difficult to apply refactoring to all methods at one time. ( The “Pull Up Method” refactoring can be applied to each Code Clone.) METRICS 2005
We propose a method to classify chained clones by using two metrics. Two method groups for classifying chained clones G1 The group of methods having clone relations G2 The group of methods having dependency relations These metrics evaluate the relationship of distance and position in the class hierarchy among methods belonging to these two groups. R1 All methods belong to classes that exist in the same class. R2 All methods belong to classes that have common parent classes. R3 Some methods belongs to classes that have no common parent class. Categorization of Chained Clones METRICS 2005
The metric DCH(S) (the Dispersion in the Class Hierarchy) DCH(S) : represents the dispersion in the class hierarchy among methods • DCH(S) = 1 • DCH(S) = 2 Class S3 Class S1 2 Class S2 1 Class B Class A Class D Class E Class C Method a Method b Method c Method d Method e If there are classes that have no common parent class, the value of its DCH is undefined. METRICS 2005
Metrics to classify chained clone sets (1) • DCHS : Evaluates the dispersion of the methods belonging to G1 (The group of methods having clone relations) in the class hierarchy • DCHD : Evaluates the dispersion of the methods belonging to G2 (The group of methods having dependency relations ) in the class hierarchy • Calculate a set of DCH(S) metric from methods in each of chained method. • Select the maximum value among them as a DCHD. Method a1 Method a2 • Calculate a set of DCH(S) metric from methods having each of clone relations. • Select the maximum value among them as a DCHS. Method b1 Method b2 Method c1 Method c2 METRICS 2005
Metrics to classify chained clone sets (2) • Using the two metrics, we classify the chained clonesinto 9 categories. METRICS 2005
Metrics to classify chained clone sets (2) • Using the two metrics, we classify the chained clonesinto 9 categories. • Extract Method Refactoring • All the methods in a chained clone that are contained in a single class. METRICS 2005
Metrics to classify chained clone sets (2) • Using the two metrics, we classify the chained clonesinto 9 categories. • Pull Up Method Refactoring • All methods in a chained clone belong to classes that have common parent classes. • All methods of each chained method are in the same class respectively. METRICS 2005
Metrics to classify chained clone sets (2) • Using the two metrics, we classify the chained clonesinto 9 categories. • Extract SuperClass Refactoring • Some methods in a chained clone belong to classes that have no common parent class. • All method of each chained method arein the same class respectively. METRICS 2005
Metrics to classify chained clone sets (2) • Using the two metrics, we classify the chained clonesinto 9 categories. • Difficult to apply refactoring to all methods at one time • Chained methodsexist in different classes. METRICS 2005
Evaluation Overview • Objective • How many chained clone sets exist in actual Java programs? • Is it possible to classify chained clone sets by using the proposed metrics and to apply suggested refactorings to them? • Target software • Open source software • ANTLR 2.7.4 (47,000 LOC, 285 Classes) • Compiler-Compiler ( Java, C++, C# ) • JBoss 3.2.6 (640,000 LOC, 3364 Classes) • J2EE Application Server • Commercial software • X ( 70,000 LOC, 309 Classes ) • Y ( 81,000 LOC, 290 Classes ) • We used CCFinder to detect code clones[1]. [1] T. Kamiya, et. al., CCFinder: A multi-linguistic token-based code clone detection system for large scale source code, IEEE TSE, vol.28, no.7, pp.654-670, Jul. 2002. METRICS 2005
Evaluation Detected chained clone sets • ANTLR 2.7.4 • Software X In category 21, the maximum number of methods is very large. Similar functionalities for Java, C# and C++ The number of chained clone sets in category 31 is large. Two packages have similar utility classes. METRICS 2005
Evaluation Refactoring for Category 31(ANTLR) • We applied suggested refactorings to chained clone sets in ANTLR. Extract Super Class GeneralCharFormatter escapeString call escapeChar CSharp CharFormatter Java CharFormatter escapeString escapeString Java CharFormatter CSharp CharFormatter call call escapeChar escapeChar After Refactoring Before Refactoring METRICS 2005
Conclusion • We focus on refactoring for chained clones that consist of sets of the methods with dependency relations • Define chained clone • Two metrics to classify chained clones according to their applicable refactorings • Case studies to show the usefulness of the proposed metrics • Future Works • Provide information about the internal structure of chained clones • Apply our proposed method to some other Java programs METRICS 2005