230 likes | 352 Views
17-791 Software Research Seminar (SSSG). Investigating JAVA Classes with Formal Concept Analysis. Uri Dekel (udekel@cs.cmu.edu). Based on M.Sc. work at the Israeli Institute of Technology. To appear: 10 th Working Conference on Reverse Engineering (WCRE’03), and as a poster in OOPSLA’03.
E N D
17-791 Software Research Seminar (SSSG) Investigating JAVA Classes with Formal Concept Analysis Uri Dekel (udekel@cs.cmu.edu) Based on M.Sc. work at the Israeli Institute of Technology.To appear: 10th Working Conference on Reverse Engineering (WCRE’03), and as a poster in OOPSLA’03
Outline • Research goals and hypotheses • A crash-course in formal concept analysis • Interface visualization • Reasoning about class implementation. • Applications to code inspection • Additional research Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Goals • Research question: ``Can we exploit the data-member based cohesion between function-methods in a class to reason about the class and discover errors?’’ • Specifically: • Provide faster learning curve for new class users by improving interface presentation • Assist reverse engineering by visualizing structure • Assist code inspection by suggesting reading order • Important principle: keep it simple to use and learn. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Hypothesis #1 • Data-member use is fundamental to understanding a class. • All possible implementations of an operation will use the same fields • Representation changes are rare • Basis for cohesion-based metrics (e.g., LCOM) • Analogous to global variable based modularization of procedural code. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Hypothesis #2 • Methods that use the same combination of fields are likely to be related. • e.g., get/set, add/remove, etc. • Even more so due to the ``shopping list approach’’ • Promotes complete interfaces using composite methods Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Means • Formal Concept Analysis • Mathematical classification technique • Uses binary relation (context) between objects and attributes • not to be confused with OO terms • Produces a concept lattice (next slide) • Much literature on applications in various fields Example: Context of the Pnt3D class Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Formal Concept Analysis • Input: A context <O,A,R> • O is a set of objects • A is a set of attributes • R is a binary relation between O and A • Mapping: Galois Connection • Common attributes of a set of objects: • Common objects of a set of attributes: • Output: Concepts <O’,A’> s.t. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Formal Concept Analysis Example: Concepts of the Pnt3D class A concept lattice is based upon a partial order between concepts: Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Concept Lattices • A sparse concept lattice provides an alternate view of the tabular context and the full concept lattice • Each concept is a group of objects which have the same attributes • The attributes are the union of attributes in that concept and all the concept that it dominates • In our case, methods that usethe same fields are clustered together • Reveals structure and asymmetries Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Interface Visualization • The lattice partitions the methods in the interface into equivalence classes • Similar methods are heuristically clustered together. • An automatic ``feature categorization’’ • Lattice provides multidimensional connections • Compare with simple lexical lists of methods (Note: class is “flattened” to remove inheritance details) Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Interface Visualization • To be effective, multiple methods should appear in each concept, on average • A lattice can have up to n=2MIN(|M|,|F|) concepts • In a data set of circa 6000 classes: • In 99.5%, n < M + F • In 77.4%, n < M Example: Concepts vs. Methods in Eclipse. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Case Study • The Molecule class from CDK • CDK: Chemistry Development Kit • Open source library of chemistry related classes • Developed at the Max Plank institute in Germany • Used in chemistry visualization applications • Why the Molecule class? • Has a large interface (nearly 75 public members) • The represented entity is familiar to most people • Our technique revealed new errors in this class. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Case Study • Lattice structure hints on class structure • A lot of independent operations on the left. • Similar to a C struct. • Cohesive component on the right. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Interface Visualization • Multiple methods with the similar signatures indicate possible repetition. • Inconsistency in naming. • Inconsistencies in return types. • Because related methods are grouped in concepts, we can notice inconsistencies or repetitions Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Investigate Implementation • We examine fields and dependencies between concepts to understand the cohesive component • Collections of atoms and bonds • Micro-management of arrays (count field tracks available items) • Inconsistencies and broken invariants. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Investigate Implementation • Asymmetries are revealed by examining pairs of related concepts. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Embedded Call Graph • A concept lattice clusters methods but does not portray interactions • Call graphs show interaction between methods but layout does not depend on semantics • Embedded call graph combines the two Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Code Inspection • Lattice can help us select a reading order • Minimize focus shifts. • Similar methods are read consecutively. • We define a global order between concepts. • e.g., each component separately, topological ordering, read by order of layers. • We define a local order between methods in each concept. • e.g., topological ordering, read by order of simplicity, etc. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Tooling Support • Batch-mode prototype • Produces lattices and metrics • Database-support for metrics and statistics research • Interactive Eclipse plug-in prototype • Adds an additional view for a .java files • Uses simplistic external static analyzer. • Limited by current 2D capabilities of eclipse. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Research Directions • Conduct user studies to validate methodology • Preliminary user-studies provided good feedback • Lattice-based metrics suite • Application to class design in CASE tools • Interactive class diagram editor based on concept lattice • Semantics assigned by connecting methods to fields. Compare with simply adding methods to a list as in current tools. Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Research Directions • Class-wide “diffing” • Provide birds-eye view of changed areas. Example: Differences between the original version of the “Graph” class of VGJ (Visualizing Graphs with Java) and the Technion adaptation of that class. Original appear in bold font, modifications appear in plain font Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar
Graph Class Investigating Classes with FCA, Uri Dekel, 17-791 Software Research Seminar