190 likes | 222 Views
ACDC: An Algorithm for Comprehension-Driven Clustering. Vassilios Tzerpos R.C. Holt. Highest cohesion clustering. SS1. SS6. SS9. SS5. SS12. SS2. SS8. SS4. SS11. SS7. SS10. SS3. Contents of SS11. Essential comprehension features. Effective cluster naming
E N D
ACDC: An Algorithm forComprehension-Driven Clustering Vassilios Tzerpos R.C. Holt
Highest cohesion clustering SS1 SS6 SS9 SS5 SS12 SS2 SS8 SS4 SS11 SS7 SS10 SS3
Essential comprehension features • Effective cluster naming • Bounded cluster cardinality • Familiarity • Comprehension as pattern recognition • Certain subsystem patterns emerge often in manual decompositions of software systems
Proc3 Proc2 Proc1 Proc4 Proc6 Proc5 Source file pattern File1 File2 Var2 Var1 Var3
Directory structure pattern Dir1 Dir2 File1 File4 File5 File2 File6 File8 File3 File7 File9
Body-header pattern bob.c alice.c bob.h alice.h
Leaf collection pattern sin.c cos.c tan.c
Support library pattern busy.c tired.c weary.c
Central dispatcher pattern dispatcher.c
Subgraph dominator pattern dominator.c a.c b.c c.c d.c e.c f.c g.c z.c
The ACDC algorithm • Two stages: • Using a pattern-driven approach, a “skeleton” of the final decomposition is created. Subsystems are named appropriately. • The decomposition is completed by applying an extended version of the Orphan Adoption algorithm
Skeleton construction • Source file clusters • Body-header conglomeration • Leaf collection and support library identification • Ordered and limited subgraph domination • Creation of “support.ss”
Orphan Adoption • Incremental clustering technique • Orphan: a newly introduced resource to a software system • Orphans are adopted by the subsystem that interacts mostly with them • Assuming that a substantial skeleton has been constructed in the first stage, the same technique can be applied here
ACDC properties • Subsystems have familiar or intuitive names • The cardinality of the subsystems is bounded • The final decomposition is nested and unbalanced • Limited use of the directory pattern • Magic numbers not important
Algorithm validation • We experimented with two different software systems, TOBEY and Linux. • We measured the following: • Performance • Stability • Skeleton size • Quality 54 sec 84 sec 81.3% 69.4% 64.3% 51.1% 64.2% 55.7%
Conclusions • Clustering approaches should focus on comprehension • Pattern-driven approach appears to perform satisfactorily • Impact of ACDC’s features on comprehension remains to be determined