190 likes | 309 Views
R&D Spillovers: The 'Social Network' of Open Source Software. Chaim Fershtman (Tel Aviv University, CEPR) Neil Gandal (Tel Aviv University, CEPR) November 2008. Intoduction. R&D project involves teams of researchers exchanging ideas and sharing information.
E N D
R&D Spillovers: The 'Social Network' of Open Source Software Chaim Fershtman (Tel Aviv University, CEPR) Neil Gandal (Tel Aviv University, CEPR) November 2008
Intoduction • R&D project involves teams of researchers exchanging ideas and sharing information. • Whenever co-workers collaborate on a joint R&D project, they create knowledge spillovers -- two aspects: • Innovation provides information and ideas for the development of another innovation. • A direct spillover among people who work together
Introduction Continued • The flow of knowledge among individuals depends on the details of their collaboration in R&D teams as well as on their social interaction with other researchers. • For any structure of R&D teams, one can construct a two-mode weighted network that provides the details of the teams’ structure. • "project network" - two R&D projects are connected if there are developers who participate in both R&D projects (weight of this link depends on the number of developers that participate in the two projects) • "developers network," - two developers are connected if they work on the same R&D project (weight of link depends on the number of projects on which the two work together)
Introduction Continued • Our question: How does the structure of the ‘social network’ affect the R&D outcome? • Detailed information about participants in R&D projects is typically hard to obtain. (exception - academic publications - Goyal, van der Leij, & Moraga-Gonzalez (2006) ) • Recent "open source revolution" provides a unique opportunity to study relationships between network structure and success in a two-mode R&D network of projects and researchers
Introduction Continued • Open source software development is done by a network of software developers. • Since there are many such projects, these developers may be involved in more than one project and may work with different groups of co-developers in various projects. • Development of these projects is done in the public domain (developers can be identified by their e-mail addresses) • We can use this information to construct the two mode network of projects and developers.
Literature • Role of social networks in the functioning of the economy has been extensively discussed in the literature: Surveys - Jackson (2006, 2008) and Goyal (2007); methods & applications: Faust & Wasserman (1994). • Our paper is generally related to the literature on 'the effect of network structure on behavior:' Ballester, Calvo-Armengol & Zenou (2006), Calvo-Armengol & Jackson (2004), Ioannides & Datcher-Loury (2005), Goeree, McConnell, Mitchell, Tromp & Yariv (2007), Jackson & Yariv (2007), and Mobius & Szeidl (2007) • Our focus on relationship between network properties and the performance or success of different nodes in this network. • Closely related papers: Calvo-Armengol, Patacchini & Zenou (2008) and Ahuja G., (2000). Calvo-Armengol et. al. use data on adolescent friendship network and consider the effect of network structure and centrality measures on the pupils' school performance. Ahuja examines relationship between the network of technical collaboration among firms in the chemical industry (from 1981-1991) and innovation (patents)
Our Approach • Paper uses data from Sourceforge.net to construct a two-mode network of OSS projects and developers. • Sourceforge.net - largest repository of OSS code & applications available on the Internet: 114,751 projects 160,104 contributors (June 2006). • Project page links to a “Developers page” that contains a list of registered team members: We construct the project network and the contributor network. • Both network consists of one “giant” connected component and many smaller unconnected networks: • 77% of the contributors worked only on a single project:344 “star” contributors that worked on ten or more projects).
Success • It is not easy to measure the success of open source software. • Like other products based on intellectual property, the intellectual property in software “licensed” for use. • In the case of commercial software, however, there are license fees, data on licenses issued, as well as revenues earned from these licenses. • Open source software does not have license fees and information on the number of licenses is not available. • Measure project success by number of times a project has been downloaded…Clearly not ideal. • Downloads often used in order to measure the impact of academic papers and articles on the web. We assume that the number of downloads of open source projects is likely quite correlated with use and value.
Variables Available for Study • Three groups of variables: • Control variables: amount of time that the project has been in existence, stage of development, number of operating systems, number of languages, etc. • Network Variables: • Variables, like degree (the number of links) that are comparable across all projects. • Network centrality measures; these variables are only comparable for projects in linked components. • Betweenness-proportion of geodesics (shortest path) between pairs of other nodes that include this node. Node is central if it serves as a valuable juncture • Closeness-inverse of sum of distances between the node and other nodes, multiplied by number of other nodes. Measures how far each project is from other projects
Summary of Results • Additional contributors are associated with higher output (downloads) - increase in downloads associated with an increase in contributors is much larger for projects in the giant component. • Betweenness centrality is highly associated with a higher number of downloads. Suggests that projects “well-positioned” in information flows are more successful and there are positive spillovers of knowledge for projects occupying critical junctures in the information flow. • Controlling for the correlation between downloads & two measures of centrality (betweenness and closeness), degree is not positively associated with the number of downloads.
Robustness Analysis • “Star" - contributor who worked on five or more projects. Effect not statistically significant (coefficient=0.10, t=1.41), but the presence of a "star" contributor is positively correlated with the success of the project. • When including projects that had (i) been in existence for at least two years & (ii) more than one contributor – betweenness effect unchanged, closeness effect not stat. significant
The Importance of Strong Ties • Potential information flow may depend also on the number of contributors that participated in the two projects. • Two projects are ‘strongly’ linked if and only if they have at least two contributors in common. • In this new network, the largest component of strongly connected projects consists of only 259 projects.
Contributor Characteristics Associated with Project Success • We focused on how project characteristics were associated with the success of the projects. • We add information regarding the contributors' network. We know which contributors participated in each project and the network characteristics of these contributors. • After controlling for the correlation of project characteristics with project success, are centrality measures of the contributor network are correlated with project success. Created three new variables: • (i) Average degree of the contributors on a project. • (ii) Ave. betweenness centrality of contributors to project. • (iii) Ave. closeness centrality of contributors to project.