1 / 63

WCRE 1999 / 2009

WCRE 1999 / 2009. Experiments with clustering as a software remodularization method Nicolas Anquetil Timothy C. Lethbridge. Forewarning. Nicolas: After this research I became suspicious of the usefulness of clustering for remodularization. I still am. You have been warned

tacy
Download Presentation

WCRE 1999 / 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WCRE 1999 / 2009 Experiments with clustering as a software remodularization method Nicolas Anquetil Timothy C. Lethbridge

  2. Forewarning Nicolas: • After this research I became suspicious of the usefulness of clustering for remodularization. I still am.

  3. You have been warned (although note that Tim has a less gloomy view)

  4. Agenda • Background of the research • Overview of the paper • From then until now • And now what? • An analogy • Another analogy

  5. Background of the research Context: • KBRE group, U. of Ottawa, Canada • CSER project (Consortium for Software Engineering Research) • Pairs: university/company(U. Of Ottawa/Telecom. company) • Focus on real problems and/orreal situations

  6. Background of the research The project: One company's PBX • 2+ MLOC • 2+ K files • 10+ possible configurations • 10+ years old (in 1999) • 2 proprietary languages • 1 directory • 0 packages

  7. Background of the research Company situation: • High turnover (18 months) • High entry barrier (6+ months to be productive) • Aging software (and languages) • Configuration management difficulties

  8. Agenda • Background of the research • Overview of the paper • From then until now • And now what? • An analogy • Another analogy

  9. Overview of the paper ”providing solutions to help software engineers understand, restructure or migrate old software towards more modern architecture and/or languages”

  10. Overview of the paper Possible solution: ”Clustering is used to gather software components into modules significant to the software engineers.”

  11. Overview of the paper • Seminal paper by Theo Wiggerts, “Using Clustering Algorithms in Legacy Systems Remodularization”, WCRE'97 • Summary of the literature on clustering • Lists all the possible choices • Lists some advantages and drawbacks of these choices

  12. Overview of the paper ”Clustering is a sophisticated research domain with many methods [...] Reverse engineering is a young domain [...] Clustering has been used with no deep understanding of all the issues involved.”

  13. Overview of the paper ”Conclusions of Wiggerts' paper are those of the literature which may not entirely hold for reverse engineering.”

  14. Overview of the paper • For example: • Living things naturally fit in an evolution tree (more or less) • Not so with software modularization • This must impact the tools we use and how we use them

  15. Overview of the paper • Three issues • What clustering algorithms to use? • How to compute cohesion? • How to describe entities? • How to evaluate the results?

  16. Overview of the paper • Algorithms • We tested mainly hierarchical agglomerative algorithms • Some tests with hill-climbing algorithms (”Bunch” tool: Mancoridis)

  17. Overview of the paper • Entities • We clustered files (into packages) • Description • Elements contained in the files: • Types, variables, routines, macros, comments, identifiers

  18. Overview of the paper Reminder: ”Clustering algorithms do not discover some hidden structure in a system, but impose a structure on the set of entities they are given.”

  19. Overview of the paperSome results • Redundancies among description schemes: • File, routine, variable, macro, type • Comments, identifiers

  20. Overview of the paperSome results • Combining features (routine + variable + ...) improves the results

  21. Overview of the paperSome results • Direct/sibling links • Sibling more used and better

  22. Overview of the paperSome results • Avoid “sparse” descriptive features • Avoid similarity metrics that consider absence of a feature as significant

  23. Agenda • Background of the research • Overview of the paper • From then until now • And now what? • An analogy • Another analogy

  24. From then until now • Raw numbers • What extensions?

  25. From then until nowReferences (volume) [data from Google scholar]

  26. From then until nowReferences (authors) • P.Tonella(8), F.Ricca(7), C.Girardi(5), E.Pianta(5) • O.Maqbool(7), HA.Babri(6) • C.Tjortjis(5) • N.Anquetil(5) • S.Ducasse(5) • K.Sartipi(4) [data from Google scholar]

  27. J.Syst.Soft. = 4 ICSM = 3 ICSE = 2 Trans.Syst.Eng. = 2 From then until nowReferences (venue) • Thesis =11 • CSMR = 6 • IWPC = 6 • WCRE = 5 • J.Soft.Maint.Evol. = 4 [data from Google scholar]

  28. From then until nowSome extensions • Clustering, how? • New/improved algorithms • New/improved distance metrics • Clustering what? • New entities (and/or description) • Clustering, why? • Other extensions

  29. From then until nowNew algorithm • Genetic algorithm • [Mahdavi] • “Combined algorithm” • [Saeed, Maqbool, Babri, Hassan, Sarwar]

  30. From then until nowNew distance metric • Minimization of information loss • [Andritsos, Tzerpos]

  31. Data vs. Control [Davey,Burd], [Sartipi,Kontogiannis] Dynamic data [Stroulia,Systä] Co-change records From then until nowNew entities • Static web pages • [Di Lucca, Fasolino, Tramontana] • [Tonella,Ricca,Pianta, Girardi] • Association rules • [Maqbool,Babri]

  32. From then until nowOther extensions • Evaluations / comparisons • [Tonella], [Wu, Holt], [Parsa, Bushehrian] • Framework

  33. From then until nowOther extensions • Needs of maintainers? • [Tjortjis, Layzell] • Input for visualization tools • [Ducasse] • Naming clusters • [Tzerpos], [Maqbool, Babri]

  34. Agenda • Background of the research • Overview of the paper • From then until now • And now what? • An analogy • Another analogy

  35. And now what? • Back to paper's results • Wild ideas in clustering • Related topics

  36. And now what?Paper's results • Choice of (traditional) algorithm matters little • It will give a result • Not significantly better or worse than other

  37. And now what?Paper's results • Choice of similarity metric matters little • As long as they don't consider absence of a feature as a sign of similarity

  38. And now what?Paper's results • Choice of description scheme for entity matters a bit more • May be source of short term progress? • Using dynamic information?

  39. And now what?Wild ideas • Consider new entities? • Individual instructions? • Non code: requirements, model elements, tests, … ? • Process-wise modularization? • Clustering requirements, models elements, ...

  40. And now what?Related topics • Problem without solution? • Software modularization is highly subjective • Packages are not mutually exclusive • Decisions must be made that are always wrong (and always correct)

  41. And now what?Related topics • Modularization is a logical (virtual) decomposition based on semantics • High cohesion, low coupling may only be an (imperfect) by-product of pre-chosen modularization • Cohesion/coupling not a driving force but a secondary goal? • Other forces, e.g. packages of “comparable” sizes

  42. And now what?Related topics • Typical example: Utility package • Low cohesion, high coupling • java.util • BitSet, Calendar, Currency, Dictionary, EventListenerProxy, Formatter, Observable, Random, ResourceBundle, Scanner, UUID, TimeZone, ...

  43. And now what?Related topics • How to evaluate results? • Open question in the paper • Cohesion/coupling • Normaly useless because it is the function optimized by the algorithms • Gold standard • Manually: expensive, not precise • Automatically: biased

  44. And now what?Related topics • How to evaluate results? • Other metrics, e.g. Stability, Non-extremity [Wu]

  45. Agenda • Background of the research • Overview of the paper • From then until now • And now what? • An analogy • Another analogy

  46. And now what?Paper's results • ”The fact that all six algorithms are ranked low on authoritativeness suggests that they may not be mature enough for use in production on large systems undergoing evolutionary change.However ...”[Wu, Holt, 2005]

  47. An analogy • A short story of Belo Horizonte: • In 1893 a new capital is planned in the state of Minas Gerais (Brazil) • The arquitects/urbanists get inspiration from Washington D.C.

  48. An analogy • The initial architecture: • Planned Belo Horizonte

  49. An analogy • The city grew (2.5 Mhab., area=5.1 Mh.)

  50. An analogy • The city grew (2.5 Mhab.)

More Related