370 likes | 614 Views
Improving Dependency Structure of Large Software Projects. Brown Bag Seminar Murat Gungor Friday, October 15, 2004. Goals. Monitor progress in large software projects. Provide tools for continuous extraction of structural quality from source code.
E N D
Improving Dependency Structure of Large Software Projects Brown Bag Seminar Murat Gungor Friday, October 15, 2004
Goals • Monitor progress in large software projects. • Provide tools for continuous extraction of structural quality from source code. • Provide means to improve software system’s dependency structure.
Introduction • Software is an expensive product - it involves intensive labor. • Software projects typically consist of many parts. • Interdependency between parts of a project is desirable. Needed for one component to use another. However excessive dependency reduces; • Testability • Maintainability • Reusability • Understandability • Observing current state of a project is critically important, since early detection of quality defects will avoid delays, difficulties and costs associated with development evolution later in project lifecycle.
Problem Definition • Dependencies between software files are essential so that one component may provide services to another. • However, dependencies complicate process of making changes, perhaps to fix latent errors or performance problems, because of effects a change may have on other files. • When files each bind to many other files and mutual dependencies exist between them, maintenance and testing may become quite difficult to carry out effectively. • It is not uncommon for a change in one file to precipitate a cascade of changes in other files, especially in the presence of mutual dependencies
Motivation • Provide managers of large software projects immediate views of current state of their project’s products. • We study existing projects to try to understand ways to do that. • Our current work has shown that static dependency structure is an important element of that analysis.
Numbered files to the right depend only on files above them, but do not necessarily depend on every file above. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Problem: Large Fan-out After topological sort Top. Sorted Files Depending on scores of other files (large fan-out) may indicate a lack of cohesion – the file is taking responsibilities for too many, perhaps only loosely related, tasks and needs the services of many other files to manage that. Structure chart - large Fan-out Level Dependency
Strong Comp. Size # of SC with this size 8 7 6 5 4 3 2 1 1 4 4 1 Problem: Large Strong Components strong component is a set of mutual dependencies After topologically sorting, strong components are expanded Top. Sorted Files Files 2, 3, 4, and 5 cannot be ordered. The order given is the best we can achieve. Ideal testing process: • test those files with no dependencies, then test all files depending only on files already tested. • For testing, a strong component must be treated as a unit. The larger a strong component becomes, the more difficult it is to adequately test. • Change management becomes tougher, due to con-sequential changes to fix latent errors or performance problems Dependency chart Level Dependency
High Fan-in is not inherently bad. It implies significant reuse which is good. However poor quality of the widely used file will be a problem. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Problem: Large Fan-in After topological sort Top. Sorted Files High fan-in coupled with low quality creates a high probability for consequential change. By consequential change we mean a change induced in a depending file due to a change in the depended upon file Structure chart - large Fan-in Level Dependency
Ideal structure has cohesive componentss with no mutual coupling. 8 7 6 5 4 3 2 1 Good Dependency Structure After topologically sorted strong components expanded Top. Sorted Files Each component (file) depends only on its close neighbors. All files haveLow fan-in and fan-out. There is no call back to upper level components, or deep call forward. Dependency chart Level Dependency
This view is generated by our tools: DepAnal DAView It shows all files that depend on one specific file in largest strong component (Fan-In). This is Mozilla, Version 1.4.1, Windows BuildPlot shows some very large mutual dependencies Green lines show Fan-Out of one file in a large strong component. Note dependencies both inside and outside component. Size of bubble proportional to number of files in strong component.
Is Complex Dependency Really a Problem? • Mozilla was targeted for Apple OSX.10 (Panther) but Apple switched to KHTML • Apple snub stings Mozilla “Bourdon said Safari engineers looked at size, speed and compatibility in choosing KHTML. In addition to Mozilla, Apple also considered building its own browser from scratch.” "Translated through a de-weaselizer, (Melton's e-mail) says: 'Even though some of us used to work on Mozilla, we have to admit that the Mozilla code is a gigantic, bloated mess, not to mention slow, and with an internal API so flamboyantly baroque that frankly we can't even comprehend where to begin,'" Zawinski wrote.
Visibility • The dependencies shown on the previous slide are, without our tools, invisible. • Developers know only a small part of the dependency structure based on their own reading of the code. The rest they find by observing breakage when they change something. • Note that Mozilla, 1.4.1 is composed of 6701 files! Impossible to understand that dependency structure without effective tools.
Project Monitoring • Monitoring software quality in a development project is an important task required of project management, especially for large-scale projects. • Constant feedback is an essential part of project management. • Watching progress manually is not an effective way in terms of time and correctness of results. • Up to date project documentation is not available always, but source code is. • Obtaining information from source code provides instant feedback. How do you do that for 6701 files?
Static Source Code Analysis • Provides instant snapshot of project’s state • Helps to diagnose the state of health of software project effectively. • Provides (almost) accurate result • Provides constant progress monitoring • Helps to determine effect of potential decisions • Helps to improve control because we change based on measurements, not guesses
Focus is dependencies among files • Many engineering organizations use source code files as the unit for analysis, management, testing. • Because we seek to provide support we: • Investigate dependency structure between files. • Identify causes of dependency. • Research possible ways to improve dependency structure of existing software. • Automate static source code analysis to extract dependency structure and other software metrics. • This isn’t as easy as it sounds for large file sets
Importance of this Study • Software’s quality depends on quality of its parts. • Future enhancements depend on existing system. • Maintainability depends on quality of current foundation. • Reuse is directly affected by dependencies: • To reuse in a different context implies that we can extract the reused from its context. • That can’t be done when dependencies are out of control.
Scope of the Study • We are not analyzing syntactic correctness of code. • We are not analyzing logical correctness of code. • Its applicability includes C-based procedural and object oriented languages: C, C++, C#, Java. • Our tools only support C and C++ • Much of remaining work deals with repackaging content of existing code files to enhance dependency quality. • Research on repackaging techniques with heuristics, optimization. • Intent is to modify structure, not introduce new code. • Creating applications to automate obtaining information from source files.
Progress till now • Developed DepAnal, which is C/C++ static source code dependency analyzer tool. • Developed DAView, which visualizes dependencies among files and components in graphical representations. • Preparing paper for submission to “ISCA 20th INTERNATIONAL CONFERENCE ON COMPUTERS AND THEIR APPLICATIONS (CATA-2005)” March 16-18, 2005, New Orleans, Louisiana, USAhttp://isca-hq.org/confr.htm • Full paper Submission Deadline • November 5, 2004
Dependency Model • Focus is dependencies between files. • Files are unit of testing and configuration management • Based on types, global functions and variables. • Dependency Model - file A depends on file B if: • A creates and/or uses an instance of a type declared or defined in B • A is derived from a type declared or defined in B • A is using the value of a global variable declared and/or defined in B • A defines a non-constant global variable modified by B • A uses a global function declared or defined in B • A declares a type or global function defined in B • A defines a type or global function declared in B • A uses a template parameter declared in B • Outputs are presented as direct dependencies. (does not show transitive closures for ease of interpretation – too dense)
Architectural view of DepAnal • The goal is to build a tool that can be used to constantly monitor evolution of the state of large software systems • Makes two passes over each file in the project. Finds dependencies based on static type analysis • DepAnal collects data from source code with the help of a C/C++ tokenizer and semi-expression composer.
Mozilla Project Version 1.4.1 • The Mozilla project is a very large project developing browser tools for many different platforms. • Win 32 Configuration • Number of executables: 94 • Number of dynamic link libraries: 111 • Number of static libraries: 303 • Number of source files for Win32, v 1.4.1: 6701 • Analysis took approximately 24 hours on Dell Dimension 8300 with 1 G Memory Wow!
Dependency Analysis Results • Show different views of dependency data for project and draw conclusions about what such data can disclose concerning a project’s implementation. • The analysis results are presented for several data sets, in six views: • Fan-in: the number of files that depend on a file, for each file in the analysis set, and related fan-in density histogram. • Fan-out: the number of files that a file depends on, for each file in the analysis set and related fan-out density histogram. • Strong Components: groups of files that are all mutually dependent and its related strong component density histogram. • Topological sort of the strong components. • Expansion of all strong components within the sorted data. • Cyclomatic complexity versus file size. • We examine each of these views and interpret their data with respect to measures of project implementation strengths and weaknesses they reveal.
Fan-in Data Mozilla GKGFX library • Number of source files 655. • Dependencies from within the library. • When we analyze the entire build many of these fan-in numbers will increase. High Fan-in coupled with low quality creates a high probability for consequential change.
Fan-in Density Mozilla GKGFX library • Plot shows that significant number of library source code files have high fan-in, characteristic of a widely used library. A library with this profile should be given high priority for analysis by the test team and quality analysts.
Fan-out Data Mozilla GKGFX library • A file with large fan-out may be symptomatic of a weak abstraction. Fan-Out of 60! We expect that a well-designed source file should carry out its assigned tasks with the aid of a few trusted delegates and perhaps a few references to commonly used utilities.
Fan-out Density Mozilla GKGFX library • Large Fan-Out may be symptomatic of weak abstraction. We’ve show elsewhere that High Fan-Out is correlated with large number of changes. There are a significant number of files with large fan-out.
Expanded Topological Sort GKGFX Library If the file belongs to a strong component and any other file in that component is changed, rigorous testing dictates that it be retested. This makes a compelling argument in favor of continuous regression testing using test harnesses. Approximately half the files in this library cannot be put into a classic testing sequence. This indicates a high probability of repeatedly testing a given file. Components below the diagonal are due to cycles in dependency graph, e.g. mutual dependencies.
Dependency Data for the Entire Windows-Based Mozilla Build The plot below is a topological sorting of the dependency graph and then expanding strong components of the entire Mozilla build for windows. This plot is so dense that it is becoming difficult to draw conclusions, but the plot is consistent with previous figure for the GKGFX library.
DAView shows that the GKGFX Library does indeed have significant structural problems, as predicted by the preceding views. Note that these problems, made visible by our tools, are normally invisible! This is Mozilla, Version 1.4.1, Windows BuildPlot for GKGFX Library shows some very large mutual dependencies
Towards Improvement • If we can identify (low-level) causes of dependencies, we can reorganize file contents to improve inter-file dependency structure. • Intent is not to introduce new code, but to redistribute existing code between files to get better structure.
Future Research Future research’s primary goal is to: Provide means to improve software system’s dependency structure. End of presentation – Thanks for listening
Backup Slides • The following slides provide a little more detail in a few areas.
Files - Unit for analysis • In most development organizations, files are “unit” of testing and configuration management. • Dependencies between software files are essential so that one component may provide services to another. • If a file is using services of other files, it cannot be tested alone. • The larger the number of dependency between files, the harder it is to: test, manage, understand, reuseThe situation gets worse if there are mutual dependencies. • Therefore, it is better to reduce dependencies between files, especially mutual dependencies.
File Dependency Structural Problems • Large Fan-out • Large Fan-in • Large Strong Components • Many and inconsistent levels in structure chart
Fine grain level dependency One file depends on another file, if it uses the other file’s services; Types Global Functions Global Variables To solve the file dependency problems we need to find more than file to file dependency. We check type-to-type, type-to-global function or variable, global function-to- type, global function-to-global function or variable. If we obtain this information, we have fine-grain level dependencies. Now we can relocate some existing code to reduce dependency density among files.
Method Used for Improving Dependency • One simple solution is to put content of all files into one file, but this is not what we would like to – uugggh, spaghetti code! • Our target is to simplify dependency structure by moving types, global functions and variables among existing files, and/or introducing new files, keeping file complexity essentially constant.
Comparing with previous dependency structure In order to see the improvement in dependency structure, we will be comparing original dependency structure with the enhanced one by comparing; • Dependency Structure • Fan-In, Fan-Out, Strong Component Size • Level analysis • And other graphic that are presented previous slides