270 likes | 379 Views
Ignominy Tool for Analysing Software Dependencies and For Reducing Complexity in Large Software Systems. On behalf of CMS Collaboration Lassi A. Tuura Northeastern University, Boston. Motivation.
E N D
IgnominyTool for Analysing Software Dependencies and For Reducing Complexity in Large Software Systems On behalf of CMS Collaboration Lassi A. Tuura Northeastern University, Boston Lassi A. Tuura, Northeastern University
Motivation • IGUANA is largely an integrator for CMS: we need to have a good grasp of the external software before its inclusion into our system • By and large we are not seeking to select one product • but are trying to merge the strengths of several packages into a very good physics analysis environment • and are seeking to provide feedback to component authors • We are interested in, among others: • How much of the external package we would use • Its impact on our physical software structure • How well it fits in with the philosophy of CMS software and other imports—in design and architecture, usage patterns, GUI, … • What other software it depends in • Commitment required, possibility of varying how much we use
Examples See http://iguana.cern.ch/3_1_0/dependencies.html
Ignominy ignominy: dishonour, disgrace, shame; infamy; the condition of being in disgrace, etc.(Oxford English Dictionary) • Model • Examines and reports on direct and transitive source and binary dependencies • Creates reports of the collected results • As a set of web pages • Numerically • Graphically • As tables Dependency Database Metrics SourceCode Graphs BuildProducts + Tables User-definedlogical dependencies ignominy: a suite of perl and shell scripts plus a number of configuration files(IGUANA)
Dependency Analysis • Ignominy scans… • Make dependency data produced by the compilers (*.d files) • Source code for #includes (resolved against the ones actually seen) • Shared library dependencies (“ldd” output) • Defined and required symbols (“nm” output) • And maps… • Source code and binaries into packages • #include dependencies into package dependencies • Unresolved/defined symbols into package dependencies • And warns… about problems and ambiguities (e.g. multiply defined symbols or dependent shared libraries not found) • Produces a simple text file database for the different dependencies: source only, binaries only, combined, forward and reverse, by package, by domain, …
Single Package Dependencies • Cmscan/IgCmscan • Testing Level: 5 • Outgoing edges: 6 • from includes: 6 (145 files) • from symbols: 4 (636 symbols) • Incoming edges: 1 • from includes: 1 (1 file) • from symbols: 1 (1 symbol)
Package Impact Diagram “Used-by” dependencies
An Extra Dependency Bad dependency in prototype code; was resolved to be from bad class placement 1 IgSoReaderAppDriver IgQtTwigBrowser via IgQtTwigModel.h 1 IgSoReaderAppDriver IgQtTwigBrowser via IgQtTwigRep.h
Static vs. Logical Logical dependencies from packages used through “Interfaces”
Discovering Forms of Modularity • A fairly good tool for discovering “philosophical structure” • IGUANA and Geant4 mostly use direct abstract interfaces • The interfaces normally generate “correct” functional dependencies: interface definitions are in packages that obviously imply the function • “Plug in one implementation of this interface” • Some use in Lizard/AIDA and ROOT • All interfaces bundled into “interface”(or framework)packages • Used by Lizard/AIDA and ROOT • Explicit dynamic loading to solve modularity issues • Used extensively by ROOT • Fall back on scripts or commands evaluated at run-time • Some use in Geant4 • Used quite a bit in ROOT
Analysis of Anaphe • Distribution of tools and utilities for LHC era physics • Combination of commercial, free and HEP software • Claims to be a toolkit • Appears to live up to its toolkit claims • Good work on modularity • Clean design is evident in many places • Dependency diagrams often splitnaturally into functional units
Analysis of ATLAS • Torture-test exercise for the tool • Large release size (~50% F77, ~50% mainly C++ but also C, Java) • Near the limit of Ignominy’s ability to discover software structure • Pictures below illustrate analysis difficulties • Visible (and known) problems • Many cleanly designed packages shadowed by a cycle with very unpleasant effects on the overall structure • A number of places show poor packaging and/or lack of abstract interfaces Known bybuild system Misconfigured analysis (1.3.2) Improved analysis (1.3.7)
Analysis of CMS/ORCA • Large C++ project • Deliberately fast development shows in places • Good design in key parts has helped • Recognised problems • Especially with the length of the release sequence • Clean-up/restructuring necessary soon • To some extent starting already • Large metric fluctuations from version to version ORCA Visualisation —needs most of the rest
Analysis of CMS/COBRA, IGUANA • COBRA • CMS Reconstruction, analysis and simulation framework • Recently successfully split off from ORCA • Quite many small packages • Has helped with modularity • Some issues with partitioning: some small cycles, certain package groups appear quite frequently • IGUANA • Generic data analysis environment with CMS focus • Many fairly small packages with targeted purpose (similar to Anaphe) • Project focus as an integrator and glue provider is fairly evident • We too have some rats nests to clean up, but at least they are small… • Has had the advantage of considerable monitoring!
Analysis of Geant4 • Fairly large C++ project • Very fine-grained (and multi-level) package structuring • Structure seems quite clean from the preliminary analysis • Fine package subdivision helps in many ways but makes analysis and code understanding more complicated • One subsystemseems stronglycoupled andneeds attention • Need to studythe use of theinternal commandsystem
Analysis of ROOT • ROOT developers have done a formidable job of breaking binary (shared library) dependencies, but… • It makes dubious use of its internal scripting facility • For example: By static analysis, nothingseems to use the postscript package directly (no incoming dependencies), but there is this code: void TPad::Print (const char *filename, Option_t *option){ […] TVirtualPS *psave = gVirtualPS; if (gROOT->LoadClass("TPostScript","Postscript")) return; gROOT->ProcessLineFast("new TPostScript()"); gVirtualPS->Open(psname,pstype); gVirtualPS->SetBit(kPrintingPS); […] } • Taking these and global objects into account makes the dependency diagrams very different—and cast doubt on usefulness of binary-only dependency diagrams for ROOT • Sign of fast growth? Need a “next evolutionary step”? • So “coherent” that replacing parts could get painful…
Analysis of ROOT… Binary + Source + Logical = Real Binary only
Package Metrics • Size = total amount of source code (roughly—not normalised across projects!) • ACD = average component dependency (~ libraries linked in) • CCD = sum of single-package component dependencies over whole release: test cost • NCCD = Measure of CCD compared to a balanced binary tree • < 1.0: structure is flatter than a binary tree (= independent packages) • > 1.0: structure is more strongly coupled (vertical or cyclic) • Aim: Minimise NCCD for given software/functionality (good toolkit: ~ 1.0)
Metrics: NCCD vs Cycles ATLAS ORCA6 ROOT ORCA4 G4 COBRA IGUANA Anaphe Toolkits & Frameworks
Metrics: NCCD vs Size ATLAS ORCA6 ROOT ORCA4 G4 COBRA IGUANA Anaphe Toolkits & Frameworks
Metrics: NCCD vs ACD ATLAS ROOT ORCA G4 COBRA IGUANA Anaphe Toolkits & Frameworks
Metrics: NCCD vs AID ATLAS ROOT ORCA COBRA G4 Anaphe IGUANA Toolkits & Frameworks
Metrics: Packages vs Size ORCA6 ATLAS ORCA4 G4 COBRA IGUANA Anaphe ROOT Toolkits & Frameworks
Metrics: Packages vs Size ORCA6 ATLAS ORCA4 G4 COBRA ROOT IGUANA Anaphe Toolkits & Frameworks
Caveats • Ignominy does only static dependencies, not dynamic ones • Indirect calls through pointers, virtual function calls • State dependencies: Data reads and writes, thread synchronisation, … • The analysis of external software is heuristic; exact information from the build system helps considerably • Difficulties are posed by copied code (copy and paste or merged libraries) and defaults dependent on link-order (“dummies” that are supposed to be overridden by client) • Most headaches so far with FORTRAN code • Ignominy must guess software structure when in doubt • Based on project-defined heuristic search rules, usually works fine • In face of an ambiguity Ignominy warns and assumes the worst • Multiply defined symbol: dependency on all definitions • Multiple header matches: dependency on all (but correct with compiler-generated dependency data!)
Status • Run for every IGUANA release as a part of release build • Canned configuration for any SCRAM-based project • Needs project specific colouring etc. configurations • Works with many other project structures • Tried on G4, ROOT and ATLAS • Plans • Consolidate scripts and fold in all the documentation • Make it somewhat easier to use and configure • Java support with Mark Donszelmann’s jneeds • Available for free at http://iguana.cern.ch/ • See the IGUANA distributions (latest = 3.1.0 recommended) • Questions? Please mail lassi.tuura@cern.ch or iguana-developers@cern.ch