1 / 60

Dependency Analysis: - CSE681 Pr#2 - an Interesting Design Example

Dependency Analysis: - CSE681 Pr#2 - an Interesting Design Example. Jim Fawcett CSE784 – Software Engineering Studio Fall 2002. Summary of Project Requirements. This year in Software Modeling and Analysis we are developing a source code dependency analyzer.

trory
Download Presentation

Dependency Analysis: - CSE681 Pr#2 - an Interesting Design Example

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dependency Analysis: - CSE681 Pr#2- an Interesting Design Example Jim Fawcett CSE784 – Software Engineering Studio Fall 2002

  2. Summary of Project Requirements • This year in Software Modeling and Analysis we are developing a source code dependency analyzer. • Program must analyze C++ and C# source code files and determine code dependency relationships between them. • Easy for C++, just extract include statements. • Not so easy for C#. Have to catalog all classes and structs defined by each source file and see which files use those types. • Program must display all the dependency results to the user and also save in an XML file.

  3. Evolving View • First reaction: • This is simple, easy to implement, no issues. • Second reaction: • Oooohhhhh! Not so simple, many issues. • Final reaction – if we do the design well: • O.K. • All the parts are simple. • They play together well. • Performance issues are manageable.

  4. Too Soon to Construction • We start with the first reaction. • Only get the second reaction when we have committed to a lot of code. • We now have a mess. • We’re very likely to hold onto the mess because of our investment of time and effort. • That’s a mistake, as we just get mired deeper and deeper into the mess.

  5. Too Soon to Code! Rising tide of chaos and drudge. (Drudge is very ugly and fragile code)

  6. Design Issue #1 – Client Focus • Program should operate quickly • We need to think about algorithms and data structures. • It should address user goals • Use case studies are needed. • User interface should be designed to support uses. • What information does the user supply? Options: • search current directory • tree rooted at specified directory (current default) • specified named path/pattern • What information does the program supply and how is it displayed? • There are likely to be very large file sets to analyze. • User needs options to select important information

  7. Use Cases • Dependency analysis generates information for: • Building test plans: • Don’t test a module until all the modules on which it depends have been tested. • Software maintenance: • What modules depend on the module we plan to change? We need to test them after the change to see if they have been adversely affected. • Documentation: • Documenting dependency information is an integral part of the design exposition.

  8. Scope of Analysis • This project is concerned with dependencies between a program’s modules. • A module is a relatively small partition of a program’s source code into a cohesive part. • A typical module should consist of about 400 source lines of code (SLOC). • Obviously some will be smaller, some larger, but this is a good target size • Typical project sizes are: • Modest size research project – 10,000 sloc 25 modules • Modest size commercial product – 600 kslocs 1,500modules

  9. Conclusions from Use Case Analysis • Even for relatively modest sized research projects, there is too much information to do an adequate analysis by hand. • We need automated tools. • The tools need to show dependencies in both ways, e.g.: • What files does this file depend on? • What files depend on this file? • The tools need to disclose dependencies between all files in the project.

  10. Critical Issues • Scanning for Dependencies in C# modules • Data structure used to hold dependencies • Displaying large amounts of information to user • False dependencies due to unneeded includes in C++ modules • Dependence on System Libraries

  11. Displaying Large Sets of Dependencies • User will probably want to: • Show dependencies sorted: • Display all files with no dependencies • Then all files that depend only on those already displayed • Useful for building test plans • Enter a name and get list of dependencies. • Useful for maintenance work. • In popup window show list of files entered so far and list of files not entered yet. • Show a scrolling list of files with their dependencies. • Select subset of files for display. • Show a compressed (bitmap?) matrix. • Show list of names, not matrix row. Matrix row may be far too long to view (e.g., 1500 elements).

  12. Design Issue #1 – Client Focus Revisited • Program should operate quickly • We need to think about algorithms and data structures. • Can we process 1500 files in a reasonable amount of time? • What is reasonable? Results must be worth the wait. • It should address user goals • Use cases have shown that: • Program will be useful in a number of contexts. • Program may have to process a very large number of files. • How do we find them? • How do we display large amounts of resulting information? • User interface should be designed to support uses. • What information does the user supply? Options: • search current directory • tree rooted at specified directory (current default) • specified named path/pattern • What information does the program supply and how is it displayed? • There are likely to be very large file sets to analyze. • User needs options to select important information • Design must insure that the program displays information, not just data!

  13. Design Issue #2 – Organizing Principles • Those design ideas, data structures, operation sequences, and partitions that make the design appear to be simple. • You know you have a good set of Organizing Principles when you grow confident that you can successfully implement the program.

  14. Dependency Scanning for C# source • Will naïve scanning work for 1500 files? • If opening and scanning a single file takes 25 msec, then: • Finding dependencies for 1 file takes: 0.025 X1500 / 60 = 0.625 minutes • Finding dependencies for all files takes: 0.625 X 1500 / 60 = 15.6 hours! • So let’s scan each file once and store all its identifiers in hash table in RAM. • If that takes 30 msec per file: • Then making hash tables for all files takes: 0.03 X 1500 / 60 = 0.75 minutes • If hash table lookup takes 10 sec per file then finding dependencies between all files takes: 0.00001 X 1500 X 1500 / 60 + 0.75 = 1.125 minutes!

  15. Prototype Code • Scanning – critically important • How much time to open file and scan for class, struct identifiers? • How much time to build HashTables and HashedFile objects? • How much time to evaluate dependencies between two files by HashTable lookup? • Sizes - important • How big is HashedFile object for typical files? • User Display – could leave to design team with requirement for early evaluation. • Mockup display alternatives.

  16. Timing Results Parsing Prototype Source

  17. Comparison of Estimated with Measured • Naïve scanning – scan each file 1500 times: • Estimated time to complete scanning of 1500 files: 15.6 hours • Measured time to complete scanning of 1500 files: 4.4 hours • Processing each file once and storing in Hashtable, then doing lookups for each file: • Estimated time to complete processing: 1.1 minutes • Measured time to complete processing: 0.2 minutes

  18. Hash Table Layout

  19. Memory to Store Hash Tables • Assume each file is about 500 lines of source code  about 30 chars X 500 = 15 KB • Assume that 1/3 of that is identifiers • The rest is comments, whitespace, keywords, and punctuators  5 KB of indentifier storage • Assume HashTable takes 10 KB per file, so the total RAM required for this data is: 0.01 X 1500 = 15 MB. • That’s large, but acceptable on today’s desktop machines.

  20. File Scanning • For each file in C# file set: • For each class and struct identifer in file • Look in every other file’s HashTable for those identifiers • If found, other file depends on current file • Record dependency • Complexity is O(n2) • For each file in C++ file set: • #include statements completely capture dependency. • Record dependency • Complexity is O(n)

  21. C# Scanning Process

  22. C# Scanning Activities • Define file set • User supplies by browsing, selection, patterns • User may wish to scan subdirectory • Extract token information from each file: • Extract tokens from each file and store in HashTable. • Save list of Class and Struct identifiers from scan • Create HashedFile type with filename, class and struct list, and HashTable as data. • Store HashedFiles in ArrayList • For each HashedFile in list: • Walk through ArrayList searching HashTables for the identifiers in class and struct list (note that this is very fast). • First time one is found, stop processing file – dependency found.

  23. C# Scan Activity Diagram

  24. C++ Scan Activity Diagram

  25. Memory to Hold Dependencies • Naïve storage uses a dense matrix. With 1500 files, that’s 2,250,000 elements. • Assume each path name is stored only once and we save 75 bytes of path information, so with 1500 files  112.5 KB • Dependency is a boolean and takes 1 byte to store  2.25 MB. • So, the total dependency matrix takes 2.36 MB. • Therefore, naïve storage is acceptable.

  26. Storing Dependencies

  27. Sorting Dependencies • Dependencies can be sorted easily. • Need to define a partial order on the set of dependencies. • Not every pair of files has an order. • Topological sort will bring set of files into an order in which any files that have dependency are ordered. • Have to be careful with mutual dependencies. • Must detect so that sorter does not cycle endlessly. • Reordering is fast and trivial: • Just swap index values in the FileIndexer Hashtable.

  28. Dependency Inversion • Use cases have pointed out that a user may need two kinds of information: • What files depend on this file? • What files does this file depend on? • If we provide a simple interface for the dependency object like this: • Dependency(ulong size) • bool Depends(string file1, string file2) • void SetDep(string file1, string file2) • Void Swap(string file1, file2) • String[] GetDep(string file) Then inversion is just a matter of changing the order of two file names. • Swap allows dependency relations to be sorted without exposing internal data members.

  29. Design Issue #2 – Organizing Principles Revisited • Organizing Principles for Dependency Analysis are mainly structural: • Encapsulate C# scanning in a set of HashedFile objects held in memory in an ArrayList. • Puts data (only) where it is needed. • Easily accessible to the scanner, hidden from other parts. • Encapsulate dependency storage in a Dependency object. • Inversion is trivial. • Accessible to scanner and GUI view manager, but clients don’t need to know any of the details. • These components make the design appear to be simple.

  30. C# Scanner Class Diagram

  31. Design Issue #3 – Program Structure • Should have an Executive module. • Makes program organization simpler. • Handles program requirements using specialized server modules. • May allow the servers to be reuseable, e.g., application agnostic. • Each server is focused on a single cohesive activity. • When this happens, server module is much more likely to be reuseable. • Much easier to test small, cohesive modules.

  32. Partitions

  33. Design Issue #3a - Communication • Partition to make data transactions as small and simple as possible. • FileHandler module creates a file list and sends a list reference to executive. • Executive passes reference to scanner. • Scanner passes dependency relation between two files to Depends module. • Many simple transactions. • View module sorts the Depends relationships. • Uses interface provided for that purpose. • Avoids polluting Depends module with user interface processing.

  34. Design Issue #3b - Ownership • Ownership determines: • What objects are visible to what other objects. • Who creates each object. • Who determines an object’s lifetime. • Usually we want the creator to also control lifetime. • The nature of associations between objects in C++ affects ownership: • One object contains an instance of another. • Containing object ownes the contained, creates and destroys it. • One object contains a reference to another. • Containing object is not the owner, neither creating nor destroying contained object. • An object has a pointer to another. • May be the owner – it created the object on the heap and will destroy. • May not be the owner – another object passed it the pointer allowing access, but not transferring ownership.

  35. Partitions

  36. Dependency Program Ownership • Executive owns fileHandler, Scanner, DataViewManager objects. • fileHandler owns DirectoryNavigator. • The Dependency object could be owned by either the specialized scanners or by the DataViewManager. • The later seems simpler. • Could make FileIndexer and Depends matrix static. Then any user can simply create an instance of Dependency. • All get the same data since data is static. • Note that communication and ownership are determined by a “need-to-know” principle.

  37. Design Issue #3c - Visibility • Visibility affects reuseability: • If we let the FileHandler module use the Scanner module we compromise its reuseability. • FileHandler will be useful for a lot of applications that have nothing to do with scanning. • Visibility affects simplicity: • If we provide a base Scanner class to provide a protocol for Executive to use, the Executive does not have to know anything about the differences between specialized scanners. • Should we decide later to add dependency analysis between script files, nothing changes except that we add a new specialized scanner.

  38. Design Issue #4 - Performance • Usually, by performance, we mean efficient use of both CPU cycles and memory. • CPU cycles are often very sensitive to the algorithms we use. • Changing from naïve scanning to use of hash tables decreased running time by almost 3 orders of magnitude! • Encapsulating dependency relations behind a simple interface allows us to easily change to a more memory efficient data structure: • It would be relatively simple to replace the dense matrix (array of arrays) with a sparse matrix (list of lists). • That definitely would improve memory use, but degrade running time.

  39. False Dependencies in C++ files • Need to scan both .h and .cpp files. • Could programmatically comment out each include – one at a time – and attempt to compile, thus finding the ones actually needed. • We would probably do this with a seperate tool. • We could also just scan, as we do for C#, but that is harder for C++ since we need to check dependencies on global functions and data as well as classes and structs.

  40. Dependence on System Libraries • Not practical to scan for system dependencies in C#. • Can’t find source modules. • System dependencies can be found using reflection, but are not particularly useful. • System dependencies in C++ are easy to find from #include<someSystemHeader> • This information is often useful, so why not provide it?

  41. Summary of Critical Issues • Scanning for Dependencies in C# modules √ • Data structure used to hold dependencies √ • Displaying large amounts of information ~√ • False C++ dependencies ~√ • Dependence on System Libraries • C# X • C++ √

  42. Design Attributes • Abstraction - model of responsibilities • Modularity - physical packaging of program • Encapsulation - public interface, private implem. • Hierarchy - delegation of responsibility • Cohesion - focus on single activity • Coupling - minimize data transfer • Locality of reference - use data where defined • Size and Complexity - small testable components • Use of objects - self contained management

  43. Partitions

  44. Attribute #1 - Abstraction • Program modules have clearly defined responsibilities: • FileHandler - generate file set • Scanner - evaluate dependencies • Depends - store dependency relations • View - present dataviews as options • Executive - orchestrate program operations

  45. Attribute #2 - Modularity • Module boundaries promote reuseability • FileHandler and Depends can be reused without changing a single character in their source code. • Responsibilities are well factored: • Two modules manage data - FileHandler and Depends. • Two modules manage user interface - Executive and View • One module conducts analysis – Scanner • Construction can be incremental: • Build in order: Executive, FileHandler, Dependency, Scanner, View • Testing can be incremental: • FileHandler and Dependency can be tested in isolation. • Scanner and View can be tested using FileHandler and Dependency as black boxes.

  46. Attribute #3 - Encapsulation • Each server module has well defined interface and private implementation • FileHandler accepts path/pattern and returns file set • Needs no other interactions • Dependency stores and retrieves dependency relation from two file names. • Needs no other interactions. • Scanner accepts file list and Dependency reference, returns filled Dependency. • Only scanner needs the details of analysis context. • DataView • Presents views using only external interface of Depends. • Don’t need to pass around pointers or references to anything accept the file set.

  47. Attribute #4 - Hierarchy • Delegation is effectively handled: • Modules: • Executive delegates to FileHandler, Scanner, and View • Scanner delegates to Depends • Dataview delegates to Depends • Classes: • fileHandler delegates to DirectoryNavigator • Polymorphism is used effectively: • Scanner class provides protocol for Executive. • CsScanner and CppScanner provide specialized behavior • Could easily add new scanner for a new analysis context.

  48. Design Attribute #5 - Cohesion • Each of the modules are focused on a narrow set of activities: • Executive - manage servers • FileHandler - generate file set • Scanner - analyze file dependencies • Depends - store dependency • View - present data views • Their activities can all be summarized in a sentence. • That does not mean that their operations are simple, e.g., Scanner.

  49. Design Attribute #6 - Coupling • Data transfer between modules is simple in almost all cases: • Executive to FileHandler: path/pattern string, file set reference • Executive to Scanner: file set reference • Scanner to Depends: two file names • View to Depends: sends file name, gets back array of dependent file names • Executive to View: see next slide!

  50. Executive to View Coupling • Issue here is which module owns the display device - console or form. • It seems most natural to me to: • Have View return a, possibly sorted, set of selected rows. • Have Executive own the display, accept and handle all user requests, and simply ask View for the appropriate data by sending it a set of file names • allow patterns so “*” gets everything • View will return a reference to Dependency after sorting or will create a new instance populated with a subset of rows.

More Related