320 likes | 428 Views
Software Networks. Christian Bird Computer Science Dept. UC Davis. A network like any other. A software network is made up of Nodes: software artifacts Edges: relationships between those artifacts (may be directed or undirected). imports. module. function. requires. co-comitted. file.
E N D
Software Networks Christian Bird Computer Science Dept. UC Davis
A network like any other • A software network is made up of • Nodes: software artifacts • Edges: relationships between those artifacts (may be directed or undirected) imports module function requires co-comitted file class includes
Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories • Libraries
Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions (3000 in apache) • Classes • Files • Modules/Packages • Directories • Libraries int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c; }
Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories • Libraries Class Logger { int logItem(Object item, int level) { stuff… } int logError(String msg) { more stuff… } more functions… }
Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files (300 in apache) • Modules/Packages • Directories • Libraries math.c float absoluteValue(float a) { return a > 0 ? a : -a; } void printName(char *name) { printf(“Hello %s\n”, name); } more functions…
Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories • Libraries class Logger { stuff… } class LogMessage { stuff… } class LogError { stuff… } more classes…
Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories (65 in apache) • Libraries /apache/http-2.0/server/core/handle.c /apache/http-2.0/server/core/serve.c /apache/http-2.0/server/core/cgi.c /apache/http-2.0/server/core/locking.c
Nodes • The nodes in a software network usually represent software artifacts at various levels of granularity • Functions • Classes • Files • Modules/Packages • Directories • Libraries (25 in apache) libkdeinit_konqueror.so libkonq.so.4 libkutils.so.1 libkio.so.4 libkdeui.so.4 libkdesu.so.4 libkdecore.so.4 libDCOP.so.4 libdl.so.2 libresolv.so.2 libutil.so.1 libart_lgpl_2.so.2 libidn.so.11 libqt-mt.so.3 libpng12.so.0 libXext.so.6 libX11.so.6 libSM.so.6 libICE.so.6 libXrender.so.1
Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries
Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c; }
Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries Class Logger inherits Writer{ int logItem(LogMessage item, int level) { stuff… } int logError(String msg) { more stuff… } more functions… FileWriter w }
Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries math.c float absoluteValue(float a) { return max(a, -a); } void printName(char *name) { printf(“Hello %s\n”, name); } more functions…
Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries import java.lang.util; import edu.ucdavis.senses; class WirelessSensor { … }
Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries A function in/apache/http-2.0/server/core/handle.c may call a function in/apache/http-2.0/apr-util/hash.c
Edges • Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc. • Functions • Classes • Files • Modules/Packages • Directories • Libraries Library libkdecore.so may need to Load libqt3-mt.so which in turn may Need to loadlibX11.so and libm.so which All need libc.so libkdecore.so libqt3-mt.so libX11.so libm.so libc.so
Example Callgraph void printInt(int a) { printf(“the number is %i\n”, a); } int add(int a, int b) { return a + b; } int multiply(int a, int b) { return a * b; } int factorial(int a) { if (a == 1) return a; return multiply(a,factorial(a-1)); } void main() { printf(“calculating 6!\n”); printInt(factorial(6)); } main printInt factorial printf multiply add Never called
Static versus Runtime Callgraphs • Static callgraphs are constructed by a syntactic analysis of the source code • Pros • Don’t have to build or run the program • Works in the presence of syntactic or semantic errors • Catches calls for exceptional situations • Fairly fast • Cons • Doesn’t get valued information (how many calls to each function) • Includes calls in dead code. Example: if (0 == 3) logError(…) • Doesn’t include calls through function pointers • Doesn’t include calls to functions in dynamically loaded libraries
Static versus Runtime Callgraphs • Runtime callgraphs are constructed by running a piece of software one or more times and logging the number of function calls • Pros • Includes number of times function calls occur • Includes calls through function pointers and dynamically loaded libraries • Will not include calls in dead code • Cons • Requires building the software • Hard to get complete code coverage • Can take a long time • May require a test harness of some kind (especially for interactive applications) along with test data
Differences between callgraphs and other graphs we’ve seen • Has a root and commonly will form a tree-like structure • Few if any cycles in callgraphs (direct or indirect recursion is rare) • Reciprocity is not common due to levels of abstraction • Preferential attachment? • If a function is called by many functions is it more likely to be called by other functions in the future? Maybe.
Software Repositories • Used in development of virtually any software project (commercial, personal, OSS, etc.) • Examples include RCS, CVS, subversion, perforce, bitkeeper, and sourcesafe • Keeps track of every change to the software, who made the change, time of change, comments associated with a change, etc. • Allows us to view the evolution of a piece of software • A developer makes changes to software code and then commits the changes to the software respository with a description of the changes
Software Networks from Repositories • The software history allows us to relate different artifacts in the software • Create an edge between functions, files, classes, if they all were modified in the same commit • Create an edge between artifacts if they were modified by the same developer
Modularity: one use of a callgraph • The characteristic of a system that has been divided into smaller subsystems which interact with each other • Software that is modular has distinct subsystems (modules) with high levels of interaction within the subsystems and low levels of interaction between the subsystems • Software that is modular is easier to understand and maintain Modular OS Scheduler Networking Filesystem Kernel Memory Management I/O devices
Modularity Case Study using Callgraphs • Exploring the structure of Complex Software Designs: An Empirical Study of Open Source by Alan MacCormack, John Rusnak, and Carliss Baldwin • Created a “Design Structure Matrix” at the file level using function calls as ties. (i.e. if a function in foo.c calls a function in bar.c then there is a tie from foo.c to bar.c, non-symmetric) • Used static analysis to extract the file-level callgraph • Clustered the DSM using standard clustering techniques • Metrics used: • Clustering cost: measure of how many function calls are not within a cluster • Propagation cost: measure of how many functions will be affected if a particular function is modified
DSM examples Example System in Graphical and Dependency Matrix Form A DSM with dependencies in an “Idealized Modular Form” A change to F propagates to E, C, and A while a change to B only propagates to A All calls are within clusters so the clustering cost is 0
Mozilla Project • Netscape opensourced Navigator in March 1998 • The project was named Mozilla and eventually led to what Firefox is today • Initially the code was complex and tightly coupled, a common phenomenon in industry code • This formed a high barrier to entry for volunteers to contribute code • Architecture was re-designed in late 1998 due to increasing complexity
More Results • After the re-design, volunteerism went up dramatically (critical for an OSS project to succeed) • Both functionality and performance increased • Both code size and number of files decreased (initially)
What are we doing with software nets? • Due to CVS history, we can create a callgraph for a piece of software at any time during it’s evolution • Do certain parts of the callgraph stabilize before others? Why? • Are certain portions of the callgraph more bug-prone than others? • What does code ownership in the callgraph look like? • What is the relationship between callgraph network, co-commit network, and ownership network?
More Questions • Does the software network bear any resemblance to the social network of the developers who work on it? (Conway’s Law) • Are callgraphs small-world networks? What is the distribution of in- and out-degrees? What would the answers mean (if anything)? • What partitioning techniques allow us to extract module structure from source code? • Is there a relationship between the co-committer social network and the email social network for developers?