290 likes | 446 Views
Conceptual Framework for Agent-Based Modeling and Simulation: The Computer Experiment. Yongqin Gao Vincent Freeh Greg Madey CSE Department CS Department CSE Department University of Notre Dame NCSU University of Notre Dame NAACSOS Conference Pittsburgh, PA
E N D
Conceptual Framework for Agent-Based Modeling and Simulation: The Computer Experiment Yongqin Gao Vincent Freeh Greg Madey CSE Department CS Department CSE Department University of Notre Dame NCSU University of Notre Dame NAACSOS Conference Pittsburgh, PA June 25, 2003 Supported in part by the National Science Foundation - Digital Society & Technology Program
Agent-Based Simulation as a Component of the Scientific Method Modeling (Hypothesis) Social Network Model of F/OSS Agent -Based Simulation (Experiment) Observation Analysis of Grow Artificial SourceForge SourceForge Data
Outline • Investigation: Free/Open Source Software (F/OSS) • Conceptual framework(s) • Model description • ER model • BA model • BA model with constant fitness • BA model with dynamic fitness • Summary
Open Source Software (OSS) GNU Linux • Free … • to view source • to modify • to share • of cost • Examples • Apache • Perl • GNU • Linux • Sendmail • Python • KDE • GNOME • Mozilla • Thousands more Savannah
Free Open Source Software (F/OSS) • Development • Mostly volunteer • Global teams • Virtual teams • Self-organized - often peer-based meritocracy • Self-managed - but often a “charismatic” leader • Often large numbers of developers, testers, support help, end user participation • Rapid, frequent releases • Mostly unpaid
TypicalCharismatic Leaders? Larry Wall Perl Linus Tolvalds Linux Richard Stallman GNU Manifesto Eric Raymond Cathedral and Bazaar
Contradicts traditional wisdom: Software engineering Coordination, large numbers Motivation of developers Quality Security Business strategy Almost everything is done electronically and available in digital form Opportunity for Social Science Research -- large amounts of online data available Research issues: Understanding motives Understanding processes Intellectual property Digital divide Self-organization Government policy Impact on innovation Ethics Economic models Cultural issues International factors F/OSS: Significance
SourceForge • VA Software • Part of OSDN • Started 12/1999 • Collaboration tools • 58,685 Projects • 80,000 Developers • 590,00 Registered Users
Savannah • Uses SourceForge Software • Free Software Foundation • 1,508 Projects • 15,265 Registered Users
F/OSS: Importance • Major Component of e-Technology Infrastructure with major presence in • e-Commerce • e-Science • e-Government • e-Learning • Apache has over 65% market share of Internet Web servers • Linux on over 7 million computers • Most Internet e-mail runs on Sendmail • Tens of thousands of quality products • Part of product offerings of companies like IBM, Apple • Apache in WebSphere, Linux on mainframe, FreeBSD in OSX • Corporate employees participating on OSS projects
Free/Open Source Software • Seems to challenge traditional economic assumptions • Model for software engineering • New business strategies • Cooperation with competitors • Beyond trade associations, shared industry research, and standards processes — shared product development! • Virtual, self-organizing and self-managing teams • Social issues, e.g., digital divide, international participation • Government policy issues, e.g., US software industry, impact on innovation, security, intellectual property
Data Collection — Monthly • Web crawler (scripts) • Python • Perl • AWK • Sed • Monthly • Since Jan 2001 • ProjectID • DeveloperID • Almost 2 million records • Relational database PROJ|DEVELOPER 8001|dev378 8001|dev8975 8001|dev9972 8002|dev27650 8005|dev31351 8006|dev12509 8007|dev19395 8007|dev4622 8007|dev35611 8008|dev8975
dev[72] dev[67] dev[52] dev[65] dev[70] dev[57] 7597 dev[46] 6882 dev[47] dev[64] dev[45] dev[99] 7597 dev[46] 7597 dev[46] dev[52] dev[72] dev[67] 7597 dev[46] 6882 dev[47] dev[47] dev[55] dev[55] dev[55] 7597 dev[46] 7028 dev[46] dev[70] 7597 dev[46] 7028 dev[46] dev[57] dev[45] dev[99] dev[51] 7597 dev[46] 7028 dev[46] 6882 dev[47] 6882 dev[58] dev[61] dev[51] dev[79] dev[47] 7597 dev[46] dev[58] dev[58] dev[46] 9859 dev[46] dev[54] 15850 dev[46] dev[58] 9859 dev[46] dev[79] dev[58] 9859 dev[46] dev[49] dev[53] 9859 dev[46] 15850 dev[46] dev[59] dev[56] 15850 dev[46] dev[83] 15850 dev[46] dev[48] dev[53] dev[56] dev[83] dev[48] F/OSS Developers - Social Network Component Developers are nodes / Projects are links 24 Developers 5 Projects Project 7597 2 Linchpin Developers 1 Cluster dev[64] Project 6882 Project 7028 dev[61] dev[54] dev[49] dev[59] Project 9859 Project 15850
Models of the F/OSS Social Network(Alternative Hypotheses) • General model features • Agents are nodes on a graph (developers or projects) • Behaviors: Create, join, abandon and idle • Edges are relationships (joint project participation) • Growth of network: random or types of preferential attachment, formation of clusters • Fitness • Network attributes: diameter, average degree, power law, clustering coefficient • Four specific models • ER (random graph) • BA (scale free) • BA ( + constant fitness) • BA ( + dynamic fitness)
ER model – degree distribution • Degree distribution is binomial distribution while it is power law in empirical data • R2 = 0.9712 for developer network • R2 = 0.9815 for project network
ER model - diameter • Average degree is decreasing while it is increasing in empirical data • Diameter is increasing while it is decreasing in empirical data
ER model – clustering coefficient • Clustering coefficient is relatively low around 0.4 while it is around 0.7 in empirical data. • Clustering coefficient is decreasing while it is increasing in empirical data
ER model – cluster distribution • Cluster distribution in ER model also have power law distribution with R2 as 0.6667 (0.9953 without the major cluster) while R2 in empirical data is 0.7457 (0.9797 without the major cluster) • The actual distribution is different from empirical data • The later models (BA and further models) have similar behaviors
BA model – degree distribution • Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). • For developer distribution: simulated data has R2 as 0.9798 and empirical data has R2 as 0.9712. • For project distribution: simulated data has R2 as 0.6650 and empirical data has R2 as 0.9815.
BA model – diameter and CC • Small diameter and high clustering coefficient like empirical data • Diameter and clustering coefficient are both decreasing like empirical data
BA model with constant fitness • Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). • For developer distribution: simulated data has R2 as 0.9742 and empirical data has R2 as 0.9712. • For project distribution: simulated data has R2 as 0.7253 and empirical data has R2 as 0.9815. • Diameter and CC are similar to simple BA model.
BA model with dynamic fitness • Power laws in degree distribution, similar to empirical data (+ for simulated data and x for empirical data). • For developer distribution: simulated data has R2 as 0.9695 and empirical data has R2 as 0.9712. • For project distribution: simulated data has R2 as 0.8051 and empirical data has R2 as 0.9815.
Advantage of BA with dynamic fitness • Intuition: Fitness should decreasing with time. • Statistics: project has life cycle behavior which can not be replicated by BA model with constant fitness but can be replicated by BA model with dynamic fitness
Conceptual FrameworkAgent-Based Modeling and Simulation as Components of the Scientific Method Hypothesis Observation Experiment
Summary • We use ABM to model and simulate the SourceForge collaboration network. • Conceptual framework is proposed for agent-based modeling and simulation. • Case study of this framework: SourceForge study through ER, BA, BA with constant fitness and BA with dynamic fitness.