1 / 29

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

This study proposes a method to extract code clones for refactoring using a combination of clone metrics. The goal is to validate the feasibility of using combined clone metrics to extract code clones for refactoring in industrial Java software.

Download Presentation

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting Code Clones for Refactoring Using Combinations of Clone Metrics Eunjong Choi†, Norihiro Yoshida‡, Takashi Ishio†,Katsuro Inoue†, and Tateki Sano* †Osaka University, Japan‡Nara Institute of Science and Technology , Japan *NEC Corporation, Japan

  2. Background: Clone Set • A set of code clones that is similar or identical to each other Code Clone 1 similar Code Clone 4 identical Code Clone 2 Code Clone 5 Code Clone 3 Clone Set: S1={Code Clone 1, Code Clone 3} S2={Code Clone 2, Code Clone 4, Code Clone 5}

  3. Background: Refactoring Code Clone • Merge code clones into a single program unit Code Clone 1 Code Clone’ 1 Refactoring Code Clone 2 Code Clone 2 Code Clone 3

  4. Background: Language-dependent Code Clone • It is unavoidable to exist in source code • because of features of the used program language. /* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache file); if == null) { (file); ; } /* Code Clone B */ def.setName(name); def.setClassName(classname); def.setClass(cl); def.setAdapterClass(adapterClass); def.setAdaptToClass(adaptToClass); def.setClassLoader(al); /* … */ /* Code Clone A */ replacement.setTaskType(taskType); replacement.setTaskName(taskName); replacement.setLocation(location); replacement.setOwningTarget(target); replacement.setRuntime (wrapper); wrapper.setProxy(replacement); /* … */ Example of the language-dependent code clone (Consecutive setter invocations)

  5. Background: Clone Metrics [Higo2007] • Quantitative information on clone sets • E.g., LEN(S), RNR(S), POP(S) • Purposes • To check features of code clones in software • To extract code clones for several purposes • E.g., refactoring, defect-prone code clones [Higo2007] Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, "Method and Implementation for Investigating Code Clones in a Software System", Information and Software Technology, pp. 985-998 (2007-9)

  6. Clone Metrics: LEN(S) • The average length of token sequences of code clones in a clone set S A token sequence [c c* ] is detected as a code clone from a token sequence <c c* c* a b> LEN(S) = 2 Superscript * indicated that the token is in a repeated token sequence Clone set S

  7. Clone Metrics: RNR(S) • The ratio of non-repeated token sequences of code clones in a clone set S A token sequence [c c* ] is detected as a code clone from a token sequence <c c* c* a b> The length of non-repeated token sequence 1 RNR(S) =• 100 = 50 2 The length of whole token sequence Clone set S

  8. Clone Metrics: POP(S) • The number of code clones in a clone set S 1 3 2 POP(S) = 6 4 5 6 Clone set S

  9. Single Clone Metric (1/2) • Clone sets whose RNR(S) is higher • They do not organize a single semantic unit • semantic unit : many instructions forming a single functionality /* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFilezipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) {x Not Appropriate for Refactoring! a part of semantic unit

  10. Single Clone Metric (2/2) • Clone sets whose POP(S) is higher • They Include many language-dependent code clones /* Code Clone in a clone set whose POP(S) is the first highest in Ant 1.7.0 */ out.println("\">"); out.println(""); out.print("<!ELEMENT project (target | "); out.print(TASKS); out.print(" | "); out.print(TYPES); Not Appropriate for Refactoring!

  11. Key Idea • It is not appropriate to extract refactorable code clones using just a single clone metric • According to our experiences • We propose a method based on combined clone metrics • To improve the weakness of single-metric-based extraction

  12. Combined Clone Metrics • Clone sets whose RNR(S), POPS(S) are higher • Each code clone organizes a single semantic units /* Code Clone in a clone set whose RNR(S), POP(S) are higher than others*/ if (ifProperty != null && p.getProperty(ifProperty) == null) { return false; } else if (unlessProperty != null && p.getProperty(unlessProperty) != null) { return false; } return true; } Appropriate for Refactoring!

  13. Case Study (1/2) • Goal: validating our key idea • Using combined clone metrics is a feasible method to extract code clone for refactoring • Target System • Industrial Java software developed by NEC • 110KLOC, 736 clone sets

  14. Case Study (2/2) • Experimental Step • Selected 62 clone sets from CCFinder's output using clone metrics. • Conducted a survey about these clone sets and got feedback from a developer. Survey Feed back CCFinder Source files Clone sets using clone metrics

  15. Subject Code Clones (1/2) • Clone sets whose either clone metric value is high • Clone sets whose LEN(S) value is top 10 high • Clone sets whose RNR(S) value is top 10 high • Clone sets whose POP(S) value is top 10 high

  16. Subject Code Clones (2/2) • Clone sets whose combined clone metrics values are high • 15 clone sets whose LEN(S) and RNR(S) values are high rank in the top 15 • 7 clone sets whose LEN(S) and POP(S) values are high rank in the top 15 • 18 clone sets whose RNR(S) and POP(S) values are high rank in the top 15 • 1 clone set whose LEN(S), RNR(S) and POP(S) values are high rank in the top 15

  17. Results of Case Study (1/2) • #Selected Clone Sets: The number of selected clones • #Refactoring: The number of clone sets marked as “Perform refactoring“ in survey

  18. Results of Case Study (2/2) • Precision : “How many refactoring candidates were accepted by a developer?“ #Refactoring Precision = #Selected Clone Sets Combined clone metrics is more accepted as refactoring candidates by a developer

  19. Summary and Future Work • Summary • Our Industrial case study shows that our key idea is appropriate. • Future Work • Investigate about recall • Conduct case studies of open source software • Suggest a new metric

  20. Thank You

  21. Clone sets whose RNR(S) is higher than others • Each code clone in a clone set S consists of more non-repeated token sequences /* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFilezipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) { /* … */

  22. Clone sets whose RNR(S) is lower than others • Consists of more repeated token sequences • Involve in language-dependent code clone /* Code Clone in a clone set whose RNR(S) is the lowest in Ant 1.7.0 */ String sosCmdDir = null; …… skip code…. private String filename = null; private boolean noCompress = false; private boolean noCache = false; private boolean recursive = false; private boolean verbose = false; /* … */ Consecutive variable declarations

  23. Survey Format: About Clone set XXX (1) Do you think that this clone set need a practice? [] Yes [] No(→Jump to next clone set) (2) If you marked “Yes” in your answer to (1), what practice is appropriate for this clone set? [] Refactoring [] Write comments about code clones, but don’t perform refactoring. [] Change nothing. [] Others. ( (3) Write the reason why did you mark in your answer to (2) Reason :

  24. Results, and Precision of each clone set in the survey

  25. Clone metric: RNR(S) (1/2) • File: • F1: a b c a b, • F2: c c* c* a b, • F3: d a b, e f • F4: c c* d e f • Superscript * indicated that the token is in a repeated token sequence • RNR(S1) of Clone Set S1 is Clone Set: S1: { , , , } 2 + 2 + 2 + 2 2 + 2 + 2 + 2 RNR(S1) = • 100 = 100 ab ab ab ab

  26. Clone metric: RNR(S) (2/2) • File: • F1: a b c a b, • F2: c c* c* a b, • F3: d a b, e f • F4: c c* d e f • Superscript * indicated that the token is in a repeated token sequence • RNR(S2) of Clone Set S2 is Clone Set: S2: { , , } c c* c* c* c c* 1 + 0 + 1 2 + 2 + 2 RNR(S2) = • 100 = 33.3

  27. Subject Code Clones • 62 clone sets • clone sets whose individual clone metric value is high • SLEN Clone sets whose LEN(S) value is top 10 high. • SRNR Clone sets whose RNR(S) value is top 10 high. • SPOP Clone sets whose POP(S) value is top 10 high. • clone sets whose combined clone metrics values are high • SLEN∙RNR15 clone sets whose LEN(S) and RNR(S) values are high rank in the top 15. • SLEN∙POP7 clone sets whose LEN(S) and POP(S) values are high rank in the top 15. • SRNR∙POP18 clone sets whose RNR(S) and POP(S) values are high rank in the top 15. • SLEN∙RNR∙POP 1 clone set whose LEN(S), RNR(S) and POP(S) values are high rank in the top 15.

  28. The Number of Duplicate Clone Set • | SRNR ∩ SPOP ∩ SRNR∙POP| = 1 • | SRNR ∩ SRNR∙ POP| = 2 • | SPOP ∩ SRNR∙ POP| = 2 • | SLEN∙ RNR∩ SLEN∙ POP∩ SRNR∙ POP∩ SLEN ∙ RNR∙ POP| = 1 CSセミナー 2010/12/01

  29. Example of clone set that are not selected… • It is too short to organize a semantic unit. • RNR metric sometimes extract unintentional code clones • E.g., Language-dependent code clones boolean isEqual(final DeweyDecimal other) { final int max = Math.max(other.components.length, components.length); for (int i = 0; i < max; i++) { final int component1 = (i < components.length) ? components[ i ] : 0; final int component2 = (i < other.components.length) ? other.components[ i ] : 0; if (

More Related