140 likes | 227 Views
ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Case Studies Instructor Paulo Alencar. Overview. Introduces a new cohesion metric called Conceptual Cohesion of Classes (C3) and uses this metric for fault prediction
E N D
ECE 453 – CS 447 – SE 465 Software Testing & Quality AssuranceCase StudiesInstructorPaulo Alencar
Overview • Introduces a new cohesion metric called Conceptual Cohesion of Classes (C3) and uses this metric for fault prediction • Compares a new cohesion metric with an extensive set of existing metrics • Presents a large cases study on three open source software systems (e.g., Mozilla v.1.6) • Marcus, A., Poshyvanyk, D., Ferenc, F., Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems, IEEE Transactions on Software Engineering, vol. 34, no. 2, March/April, 2008.
Main Results • Cohesion is usually measured on structural information extracted solely from the source code (e.g., attribute references and method calls) • A new measure for class cohesion is proposed (Conceptual Cohesion of Classes – C3) which captures the conceptual aspects of class cohesion • C3 is based on the analysis of textual information in the source code, expressed in comments and identifiers • An Information Retrieval (IR) technique is used to extract, represent, and analyze the textual information from source code • C3 can be interpreted as a measure of the textual coherence of a class within the context of the system
Main Results • Metrics used in comparison: Lack of Cohesion in Methods • LCOM4 – number of connected components (access same class variable); LCOM=1 (good), LCOM >= 2 (bad, split) (pairs of methods with no attributes in common) • LCOM1 – number of pairs of methods that access disjoint sets of instance variables, add 1 to P; or that share at least one instance variable, add 1 to Q: P –Q if P>0, and 0 otherwise (0 good, >1 may suggest split) • LCOM2 – m=#methods, a=# variables, mA=#methods that access the same variable (attribute), sum(mA)= sum over all attributes (0 is good) LCOM2 = 1 – sum(mA)/m*a
Main Results • LCOM3 = (m – sum(mA) / a ) (m – 1) (1 to 2 good; 0 bad, split) • LCOM5 = # of connected components (methods that access the same variable, or call each other) • TCC (tight class cohesion) – NP = maximum number of possible connections = N* (N-1)/2 where N=#methods NDC # direct connections (#edges in connection graph) Access the same class variable, or call trees starting at two methods access the same variables NID # indirect connections (e.g., A-B-C, where A is indirectly connected to C via B) TCC = NDC / NP (1 good)
Main Results • LCC (loose class cohesion) – LCC = (NDC + NID) / NP (1 good) • Coh – Cohesion metric Coh = sum(v(A) / m * a where v(A) = # of methods that reference attribute A sum(v(A)) = sum over all attributes (1 good) • Information-flow-based cohesion – depends on the number of information flows from/to (i.e., access or read) parameter list to/from data structure due to procedures in interfaces (higher is better)
Main Results • LSI – Latent Semantic Index - An advanced statistical Information Retrieval method - Central concept: the information about the concepts in which a particular word appears or does not appear provides a way to determine the similarity of meaning of sets of words to each other - It uses a matrix (word x context) based on knowledge in the particular domain of interest
Main Results • Conceptual Cohesion of Classes (C3) - Similarity measure for textual coherence of comments and identifiers extracted from source code - C3 for a class C measures conceptual cohesion, the degree to which methods of a class belong together conceptually. - C3 in [0,1] – close to 1 is better - Close to 1 is better: the class most likely implements a single concept or a very small number of related concepts (related in the context of the software system - C3 is closet to 0: the methods in the class most likely implement different concepts
Main Results • Example: Class MySecMan (my security manager) from Mozilla v.1.6 • The four methods share several terms such as context, pending, exception, error, failure, and security • C3 = 0.913 (a very high conceptual cohesion for the MySecMan class)
Main Results • Case Studies: - Three open source systems from different domains (developed mostly in C++) - TortoiseCVS v.1.8.21 – an extension of Microsoft Windows Explorer that uses CVS - WinMerge v.2.0.2 – tool for visualization and merging for both files and directories - Mozilla v.1.6 – an open source Web browser
Main Results • Research questions: - Question 1: Does C3 capture aspects of class cohesion that are not captured by other structural cohesion metrics? - Question 2: Does the combination of structural cohesion metrics with C3 provide better results in predicting faults in classes that the combinations of structural metrics?
Main Results • Principal components and metrics (Question 1) - PC1 is LCOM1, LCOM2, LCOM3, and ICH – metrics that count the number of shared instance variables - PC4 is C3; PC2 is TCC and LCC; PC3 is LCOM5, PC5 is LCOM3 and LCOM4; PC6 is Coh
Main Results • Predicting faults in classes (Question 2) – Individual metrics
Main Results • Predicting faults in classes (Question 2) – Pairs of cohesion metrics