1 / 31

An Exploration of Power-law in Use-relation of Java Software Systems

An Exploration of Power-law in Use-relation of Java Software Systems. Makoto Ichii , Makoto Matsushita, Katsuro Inoue Osaka University. Software Component Graph. A software system is composed of software components. Software component ( component ): building unit of a software system

bebe
Download Presentation

An Exploration of Power-law in Use-relation of Java Software Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Exploration ofPower-law in Use-relation ofJava Software Systems Makoto Ichii, Makoto Matsushita, Katsuro Inoue Osaka University ASWEC 2008

  2. Software Component Graph • A software system is composed of software components. • Software component (component): building unit of a software system • Complex use-relation is formed between components • Software component graph (component graph) represents use-relation between components • node: component / edge: use-relation • Various researches utilize component graphs to analyze software systems • It is important to know the nature of component graphs ASWEC 2008

  3. Power-law distribution • A graph is characterized by the degree distribution • The graphs whose degree distribution follows the power-lawdistribution attracts attention in various research domains • Link structure of WWW pages • Hosts on the Internet • Such graphs tend to have interesting characteristics • Self similarity • Fault tolerance p(x) = Cx-α • Explore the component graphs to seek whether the degree distributions follow the power law ASWEC 2008

  4. Questions [1-2/4] Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? ? Q. 2 Do the in- and out-degree distributions of a component graph of multiple software systems follow the power law? ? ASWEC 2008

  5. Questions [3-4/4] Q. 3 Do the in- and out-degree distributions of subgraph of a component graph follow the power law? ? Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? ASWEC 2008

  6. Definitions [1/2] Component: Java class (including interface) Use-relation: Any of the following six relation types acquired by static analysis of the component source files. • A class or an interface extends another class or interface respectively. • A class implements an interface. • A class or an interface declares a variable of a class or an interface. • A class instantiates a class object. • A class calls a method of a class or an interface. • A class or an interface references to a field variable of a class or an interface. ASWEC 2008

  7. Definitions [2/2] Component graph: Directed simple graph • node: component • edge: use-relation between components In-(Out-)degree: The number of incoming (outgoing) edges to a node in-degree: 2 out-degree: 0 A class A { void exec() { … } } class B { … A.exec(); … } class C { … A a = new A(); … } B C in-degree: 0 out-degree: 1 in-degree: 0 out-degree: 1 ASWEC 2008

  8. Observing the power-law • Plot cumulative frequency on log-log axis • The data forms a straight line if the distribution is the power law p(x) = Cx-α gradient : -(α-1) gradient : -α M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law", Contemporary Physics 46, 323-351 (2005) in-(or out-)degree ASWEC 2008

  9. Values shown in the experiments • α: exponent • Derive from the gradient of the regression line • R*2:the determination coefficient adjusted for the degree of freedom • Fitness of a regression model for data • [0..1] • Large value means good fitness p(x) = Cx-α gradient : -(α-1) in-(or out-)degree ASWEC 2008

  10. Experiment 1 • Setup component sets • Each set contains a single software system • Analyze component sets to create component graphs. • Plot cumulative frequency of the degrees on log-log axis. Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? ASWEC 2008

  11. Result of experiment 1 / JDK • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008

  12. Result of experiment 1 / ECLIPSE • The similar characteristics with JDK • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008

  13. Experiment 2 • Setup component sets • Each set contains multiple software systems • Use-relation across the systems exists • Analyze component sets to create component graphs. • Plot cumulative frequency of the degrees on log-log axis. Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law? ASWEC 2008

  14. Result of experiment 2 / ASF • The similar characteristics with Exp. 1 • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008

  15. Result of experiment 2 / SPARS_DB • The similar characteristics with Exp. 1 • The in-degree follows the power law • The out degree does not follow the power-law completely • In-degree distribution fits to the power-law straight line almost ideally. ASWEC 2008

  16. Experiment 3 • Construct subsets of SPARS_DB • Keyword: The components that contain a specified keyword in the source code • The keywords are randomly selected so that the number of resulting components is about 1,000/10,000 • Random: 1,000/10,000 random components • Analyze component sets to create component graphs. • Plot cumulative frequency of the degrees on log-log axis. Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law? ASWEC 2008

  17. Result of experiment 3 / KWD1K • The similar characteristics with SPARS_DB • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008

  18. Result of experiment 3 / KWD10K • The similar characteristics with SPARS_DB • The in-degree follows the power law • The out-degree does not follow the power law ASWEC 2008

  19. Result of experiment 3 / RND1K • The original characteristics is almost lost ASWEC 2008

  20. Result of experiment 3 / RND10K • The similar characteristics with SPARS_DB, however • # of edges is small ASWEC 2008

  21. Experiment 4 • List top-ten components in the in- and out-degree • Calculate correlation between degrees and metric values. • Spearman's rank correlation coefficient • Target: SPARS_DB Q. 4What aspects of components affects the in- and out-degree distribution of component graphs? ASWEC 2008

  22. Result of experiment 4 / In-degree • Top-ten components • The components that have fundamental/general role • Correlation with metrics • In-degree have low correlation with the metrics • The in-degree relates to the role ASWEC 2008

  23. Result of experiment 4 / Out-degree • Top-ten components • Simply large/complex classes • Correlation with metrics • High correlation with LOC and WMC • The out-degree relates to the size/complexity of a component ASWEC 2008

  24. Answers: summary of experiments [1/4] Q. 1 Do the in- and out-degree distributions of a component graph of a software system follow the power law? • The in-degree follows the power law • The out-degree does not follow the power law • Mixture of the power-law distribution and the lognormal distribution ASWEC 2008

  25. Answers: summary of experiments [2/4] Q. 2 Do the in- and out-degree distributions of a component graph for multiple software systems follow the power law? • The in-degree follows the power law • The out-degree does not follow the power law • The similar results with that of single software systems ASWEC 2008

  26. Answers: summary of experiments [3/4] Q. 3 Do the in- and out-degree distributions of subgraph of a component graph for software systems follow the power law? • Depends on how the subgraph is created. • Keyword-based subgraph has similar characteristics with the superset • Related components likely share words • Random-selection-based subgraph with small number of nodes has different characteristics • Few edges exist. ASWEC 2008

  27. Answers: summary of experiments [4/4] Q. 4 What aspects of components affects the in- and out-degree distribution of component graphs? • In-degree relates to the roles of components • Most of the components are used at the specific part • Components with fundamental/general role are used from everywhere • The more the size of component set grows, the more the value of in-degree becomes large. • Out-degree relates to size/complexity of components • Many components have reasonable size/complexity • Some components may have relatively large size/complexity • Extremely large components are unreasonable ASWEC 2008

  28. Summary • Component graphs are investigated to seek whether the in- and out-degree distribution follows the power-law • As the results, following characteristics are revealed. • The in-degree distribution follows the power-law • The in-degree of a component relates to the role of the component • The out-degree distribution does not follows the power-law • The out-degree of a component relates to the size/complexity of the component • Some sort of subgraph of a component graph have the same characteristics of degree distribution with the graph. • Future works • Explore the other types of component graph ASWEC 2008

  29. ASWEC 2008

  30. + ASWEC 2008

  31. Discussion • Generative models of a power-law graph • If a node is added to a graph, the nodes with large degree tend to get the edge to the new node. • “rich get richer” • Meanings for component graphs • If a new component is added to (developed for) a software system, the new component uses the component that is already used by many components • The members of frequently-used components hardly change even if the software development proceeds • If the member changes, it means that the fundamental structure (design, architecture) of the software is changed ASWEC 2008

More Related