300 likes | 316 Views
Information Theory in Software Metrics (Assessment and Issues). Steve Counsell, (Brunel University and CREST). Introduction. Coupling: Well-understood Excessive coupling should be avoided Empirically (in excess) has been associated with fault-proneness in C++ at least
E N D
Information Theory in Software Metrics (Assessment and Issues) Steve Counsell, (Brunel University and CREST)
Introduction • Coupling: • Well-understood • Excessive coupling should be avoided • Empirically (in excess) has been associated with fault-proneness in C++ at least • The Coupling Between Objects (CBO) metric of Chidamber and Kemerer has dominated the area • Simple count of the number of unique classes to which any single class is coupled (in whatever way)
Introduction (cont.) • Theoretical properties also well understood • Coupling of a modular system is non-negative • Merging two modules can’t increase system coupling • Based on a modular system being comprised of nodes and ‘edges’ connecting those nodes
Information Theoretic metrics (for coupling) • Pioneered by Allen and Khoshgoftaar (A&K) • First appeared based on Allen’s PhD work, c.1996 • METRICS paper in 1999 • At the time created a bit of a stir • Metrics community re-think • Could be applied to both OO and procedural • Appealed to the cross-disciplinary ethos
Roadmap • Explain A&K’s metric for system coupling • Based on a modular system graph • Demonstrate its usefulness • and drawbacks • Identify open issues • Research paths in evaluating/modifying the metric • Other applications
A modular system Source: Allen and Khoshgoftaar, 1999
Inter-module coupling Source: Allen and Khoshgoftaar, 1999
Representation Source: Allen and Khoshgoftaar, 1999
Entropy • The average information per node • Always non-negative • Defined as:
Entropy (cont.) • All logs • base 2 • Unit of measure is a bit • Graph selected has entropy H(S) of 2.46
Sub-graph analysis • Consider the subgraph Si consisting of all the nodes in S and the edges of S that have the ith node as an end point • Disconnected nodes included in the sub-graph • Calculate the same probability distribution as we did previously
For node 2 Source: Allen and Khoshgoftaar, 1999
Entropy (for distribution of node labels) • Defined as:
Entropy (cont.) • Gives an entropy H(Si) total • value (i : 0..14) of 6.28
Ethos of the coupling metric • The entropy of the modular system taken as a whole is less than or equal to the sum of entropies of the individual components • H(S) <= sum H(Si) • The difference between these values represents the true coupling relationships or ‘excess entropy’
Excess entropy C(S) Where: C(S) = 6.28 – 2.46 = 3.82
Coupling in a modular system (ms) • Coupling(MS) = (n+1) C(S) = 15 * 3.82 = 57.28
‘A metric sensitive to patterns of connections. This is attractive, because software engineers recognize patterns as well’ (Allen and Khoshgoftaar, 1999)
Source: Allen and Khoshgoftaar, 1999 • Coupling(MS): • 2.76 f. 26.83 • 8.00 g. 30.83 • 16.00 h. 34.83 • 17.32 i. 22.04 • 24.07 j. 27.78
Source: Allen and Khoshgoftaar, 1999 • Coupling (CBO): • 2 f. 8 • 4 g. 10 • 6 h. 12 • 6 i. 8 • 8 j. 8 • Coupling(MS): • 2.76 f. 26.83 • 8.00 g. 30.83 • 16.00 h. 34.83 • 17.32 i. 22.04 • 24.07 j. 27.78
Issues • Computes system coupling • Most coupling studies use a class coupling basis • Need a ‘class-based’ entropy measure (NHD) • Comparison between i. and j. • Suggests that I is ‘better’ than j. • OO people might disagree with an inheritance structure being ‘better’ • Maintaining the root node would be highly problematic • Do developers really look for patterns? • Does not take into account the ‘type’ of coupling • Can not be gleaned from a UML class diagram
Potential studies • Fault analysis • Which of the two correlates more with faults • Larger-scale study • The effect of refactoring on the values of both Coupling(MS) and CBO • Hamming distance for coupling? • A final word on cohesion……
Cohesion • A key advantage of the CBO and the reason for its popularity is that there is no argument about its interpretation and to some extent the Coupling(MS); it is an objective measure • The same cannot be said about cohesion, because it is subjective