1 / 22

Connected Components in Software Networks

Connected Components in Software Networks. Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and Informatics Faculty of Science University of Novi Sad. Content. Introduction Data collection Experiments and results Conclusions. Introduction - software networks -.

selima
Download Presentation

Connected Components in Software Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Connected Components in Software Networks Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and InformaticsFaculty of ScienceUniversity of Novi Sad

  2. Content • Introduction • Data collection • Experiments and results • Conclusions

  3. Introduction - software networks - • Two levels of software complexity:- internal complexity of software entities (classes, functions...)- structural complexity of dependencies between entities • Class collaboration networks:nodes: classes/interfaceslinks: OO relationships • Static call graphs:nodes: functions/procedureslinks: call-return relationships

  4. Introduction- connected components - • Connected component:set of mutually reachable nodes • Giant connected component:contains the vast majority of nodes • Directed networks:strongly connected componentsweakly connected components

  5. Introduction- theory of complex networks - • Random graphs:- Poisson degree distribution- ER model (static + uniform attachment) • Scale-free networks:- power-law degree distribution- BA model (growth + preferential attachment) • Exponential networks:- exponential degree distribution- Model A (growth + uniform attachment)

  6. Introduction- motivations - • Model A: test complementary cummulative in/out/total degree distributions of giant weakly connected components againts a power-law and an exponential distribution • “robust yet fragile”: investigate topological stability of giant weakly connected components • “hierarchical small-worlds, scale-free networks from optimal design”: determine size of strongly connected components

  7. Data collection • Class collaboration networks:- Ant, Tomcat, Lucene, JavaCC, JDK- extractor – Yaccne • Static call graphs:- gcc, kernel component of Linux kernel- extractor – Doxygen + our .dot aggregator

  8. Experiments and results- giant weakly connected components - Comparable networks sampled by ER, BA and Model A contain GWCC.

  9. Experiments and results- degree distribution of GWCCs -

  10. Experiments and results- Implications - • Theoretical implications:model that can reproduce connectivity pattern characteristic to software systems • Related to software engineering:in-degree = degree of class/function reuseout-degree = degree of class/function aggregation

  11. Experiments and results- theoretical implications - • Superposition model (growth + preferential attachment for out-going links + uniform attachment for in-coming links)

  12. Experiments and results- Analytical solution of the superposition model - • Continuum approach:“Mean field theory for scale-free random networks”, (Barabási et al, ’99) Din/Dout – number of in-coming/out-going links introduced by each node

  13. Experiments and results- Implications related to SE - • First combinatorial principle of graph theory:Avg(reuse) = Avg(aggregation) But:Dispersion(reuse)  ∞ as N  ∞Dispersion(aggregation) ~Avg(aggregation)2 • Conslusions:1. Software systems exhibit acharacteristic scale of code aggregation, but there is no characteristic scale of code reuse.2. Highly reused entities tend to be more reused.3. Predictability of code reuse and unpredictability of code aggregation as software system evolve.

  14. Experiments and results- Topological stability of GWCCs - • Experiments:- removal of one node: to check the existence of articulation points- successive removal of preferential nodes: to check the fragility- successive removal of nodes at random: to check the robustness • After each removal, size of the largest weakly connected component is measured • fc-pref/fc-rnd:critical fraction of nodes that needed to be removed in order to destroy giant weakly connected component when preferential/random node removal scheme is applied

  15. Experiments and results- Articulation points - • Software networks contain APs: [2.91% - 15.50%] of network size • BA model:Dtotal – number of links introduced by each nodeDtotal = 1  num(AP) in the range [31% - 35.4%]Dtotal > 1  num(AP) = 0 • BAU model:- Dtotal is not constant value but random variable such that P{Dtotal = 1} > 0- Modification does not affect scale-free properties of degree distributions and produces APs

  16. Experiments and results- preferential node removal - • Software networks are extremely vulnerable:fc(software network) < fc (BAU) < fc (EXP) < fc(RND)

  17. Experiments and results- random node removal- • Software networks (except Linux) never lose GWCCs • The same situation is for comparable networks generated by theoretical models • Linux static call graphs is scale-free, random errors sensitive network:fc(Linux) < fc(RND) < fc(EXP) < fc(BAU) • Large real-world networks: fc(RND) < fc(rw-net)

  18. Experiments and results- strongly connected components - Linux: SCCs as a minor effect Other networks: no GSCC, but have relatively large SCCs topological sort cannot be made  there is no elegant systematic testing strategy

  19. Largest strongly connected component in GCC’s giant weakly connected component containing 116 mutually reachable nodes

  20. Conclusions • Out-degree sequences of software networks can be better modeled with an exponential distribution than a power-law • Scale-free software networks contain articulation points • Software networks are extremely vulnerable to the removal of highest degree nodes, and (except Linux) share the same level of robustness as comparable networks generated by theoretical models

  21. Conclusions • Linux static call graph is an interesting and intriguing example of a scale-free network which does not display tolerance against random errors • Software networks contain relatively large cyclic dependencies - substructures that does not reflect optimal design and hierarchical small-worldliness

  22. Connected Components in Software Networks Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and InformaticsFaculty of ScienceUniversity of Novi Sad

More Related