200 likes | 312 Views
Software Collaboration Networks. By Chris Zachor. Overview. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions. Introduction. Use network analysis to better understand the SourceForge and Github community developers
E N D
Software Collaboration Networks By Chris Zachor
Overview • Introduction • Background • Changes • Methodology • Data Collection • Network Topologies • Measures • Tools • Conclusion • Questions
Introduction • Use network analysis to better understand the SourceForge and Github community developers • Identify key differences (if any) within the two communities • Examine the diversity of collaborations within these two communities
Changes • The addition of Github to the study • Contains some of the same attributes to allow for a comparison • Other communities were looked at, but they either were not large enough or did not provide enough public data.
Data Collection • Crawling the websites using a simple Perl script and regular expressions • Collect a project list from Sourceforge • www.sourceforge.net/projects/projectTitle • No specified request limit • Check for duplicates
Github Crawling • Using the Github API provides our data • Limited to 60 API calls per minute • Use multiple computers to collect all 1.5 million projects
Measures and Metrics • Degree • Clustering Coeficient • Modularity • Power Law • Small World Phenomenon
Degree • Average number of projects worked on by a developer • Average number of collaborations • Average number of developers on a project
Clustering Coeficient • Examine how likely developers are to stick together in groups • Examine both average clustering coefficient for the entire network and the local clustering coefficient for nodes of interest
Modularity • Provide us with a measure of how diverse developer collaborations are. • Range -1 < Q < 1 • Ranges closer to one show less diversity in collaboration choices • Ranges closer to negative one show more diversity in collaboration choices
Power Law • Previous studies have found that the Sourceforge community does follow the power law • No such study has been done on the Github community • Fewer developers should be apart of many project while many developers should be involved with only one project
Small World Phenomenon • Previous studies have shown the Sourceforge community does exhibit small world properties • Once again, no study has been done on the Github community • Using Pajek, I will create a random network of the same nodes and edges • Then, compare the clustering coefficient and the average shortest path
Tools • Perl • Pajek • cURL • wget • GUESS
Conclusion • Through the use of network analysis, we hope to gain a better understanding of the developers of Sourceforge and Github communities.
Questions? Suggestions? Comments?