130 likes | 317 Views
Easier Said than Done: An Empirical Investigation of Software Design and Quality in Open source Software Development. by C.A. Conley and L. Sproull Proceedings of the 42 nd Hawaii International conference on System Sciences, 2009. Following Power Point Slides and Interpretations
E N D
Easier Said than Done: An Empirical Investigation of Software Design and Quality in Open source Software Development by C.A. Conley and L. Sproull Proceedings of the 42nd Hawaii International conference on System Sciences, 2009 Following Power Point Slides and Interpretations by Frank Tsui (Southern Polytechnic State University)
Software Design “R” Software Quality • For years in software engineering it has been said by various experts such as D. Parnas, V. Basili, etc. that: • “Modularity” decreases “Complexity” • Decreasing “Complexity” improves “Quality” • Thus “modularity” should improve “quality” • In this paper by Conley and Sproull the above relationship, R, is examined empirically using data from Open Source Software
Modularity Concept • Modularity ≡ degree to which components within software are independent or “loosely coupled” from one another. • Coupling between two components x and y is described as the characteristic such that any change in either x or y will require a change in the other. (this is a limited view because we also have “central data” coupling among many components that changing “data meaning” also affects all components --- a much more subtle coupling) • Coupling between x and y also adds an extra item called “interface” between x and y that must be designed. • Thus the less the components are coupled, the less concern is among the components; therefore, there should be less work and less opportunity to make mistakes related to the coupling.
Modularity and Quality • Less work and less opportunity of making a mistake should imply that the quality improves • In addition, increased modularity (less coupling) also implies more independent components; thus pin-pointing any problem to an individual, independent component is easier and faster. Thus problem resolution should be faster and end-product quality should also improve.
Some Metric Definitions for Modularity • Modularity measured via “functional calls” • For java code use package as the unit, not class. • Interested in “calls” from class in a package A to some other class in Package B • Afferent (converging towards) coupling (AC) of a package X is a count of number of other packages in the software that “calls” something in package X. (sometimes known as “fan-in” or number of things depending on X). • Efferent (conveying outwards) coupling (EC) of a package X is a count of the number of other packages that package X “calls.” (sometimes known as fan-out or number of things X depends on). • Define Instabilityof package X as: I = EC/(EC + AC) • Define Abstractness of package X as: A = Abstr/Classes • Abstr = # of abstract classes in X (Java has “interface” as abstract class and are more “stable” --- by definition, less likely to change) • Classes = # of concrete classes + abstract classes in X
Distance from Abstractness and Instability • Note that I varies from 0 to 1 with 0 indicating most stableness when EC =0 or large AC; and 1 indicating total instability due to “all” EC. • A = 0 means there is no abstract class; and 1 means the package is all abstract classes (abstract class implies more stability because implementation details can change but it won’t change --- e.g. an “interface” in Java).
“Idealized Line” our goal ? A=1 -> very abstract I =0 -> very stable A=1 -> very abstract I =1 -> very unstable (1,1) (0,1) A = Abstractness (.5,.5) A=.5 -> somewhat abstract I =.5 -> somewhat stable (0,0) (1,0) A=0 -> very concrete I =0 -> very stable A=0 -> very concrete I =1 -> very unstable I = instability
Metric for Degree of Modularity: “Distance” • We are interested in those packages that are close to the middle of this idealized line “idealized” line. (Do you agree ?) • Define a distance metric: • D = | ((A+I) – 1)/2 | or use just the D = (A + I) – 1 • The smaller this D is -- the better ; note when A=.5 and I=.5, D=0; this implies a balanced design of somewhat abstract and somewhat stable. • Further incorporate the “line of code,” size, factor of the package and multiply by log(loc) and have D * log(loc) • Software package: Dr = 1 - [ (∑Di * log(loci))/∑log(loci) ] • Dris the distance for the complete software release • Di is the distance of each package i
Metric for Quality: Intrinsic Quality • # of Static Bugs found in source code (pre-release time) • Use McCabe’s Cyclomatic complexity number for complexity metric: • It describes the complexity of control flow • This measures the number of linearly independent paths in the package • Cyclomatic number = (# of binary predicates) + 1
Metrics for Quality: Customer Satisfaction • Number of user reported problems • The larger this number is the lower the customer satisfaction • Percentage of problems “closed” – resolved • The larger this is the more satisfied the customer is • Time to problem closure • The smaller this is the faster is the problem resolved and thus the customer would be more satisfied • In my (Tsui) days we used (#of problems reported/user-months) toget around the issue of more users causes more problem to be found
Statistical Analysis Using OSS projects • Performed “Statistical Correlation” of the following information: • Computed Degree of Modularityfor the sampled source code • Counted static bugs and computed cyclomatic number for Intrinsic Quality • Counted bugs reported, time to closure, percentageof bugs closed for Customer Satisfaction part of Quality
Looked at 4 “models” of Degrees of modularity and Quality • Do projects differ in project Quality? (unconditional means of quality #’s) • Does Modularity explain differences in Quality across projects? (regression analysis) • Does Modularity explain differences in Quality within projects? (random coefficient) • What role does Modularity play after accounting for other variables? (best fit)
Surprising Experimental Results ! • The common belief that “ increase in Modularity should increase Quality” could not be ascertained with this set of empirical data, using the authors’ definitions of modularity and quality metrics ! • Did find that : • As Modularity increased, Complexity decreased (not always –but most of the time) • As Modularity increased, number of static bugs sometimes also increased instead of decreasing • As Modularity increased, percentage of bugs closed sometimes decreased instead of increasing So ---- perhaps the definition and metrics may be WRONG?!