170 likes | 272 Views
Open Source Software Development. Jim Herbsleb ISRI Wean 1321 +1 412 268 8933 jdh@cs.cmu.edu. Geographically Distributed Development. Extremely slow Hampered by communication and coordination problems
E N D
Open Source Software Development Jim Herbsleb ISRI Wean 1321 +1 412 268 8933 jdh@cs.cmu.edu
Geographically Distributed Development • Extremely slow • Hampered by communication and coordination problems • Needs to make extensive use of collaboration technology, e.g., application sharing, shared calendars, teleconferences, chat and IM • Requires extensive use of “coordination mechanisms” such as interface specifications, plans, processes • Must carefully design division of labor across sites • Very difficult to respond to unanticipated events
Open Source Challenge • Fundamentally different model of software development • How does it really work? • What sort of process results from open source principles? • What are the properties of the software developed this way? • Case study of Apache and Mozilla (with Audris Mockus and Roy Fielding) • Research issues in open source
Empirical Research Questions • How many people wrote code for new functionality? How many people reported problems? How many people repaired defects? • Did large numbers of people participate somewhat equally in these activities, or did a small number of people do most of the work? • Where did the code contributors work in the code? Was strict code ownership enforced on a file or module level? • What is the productivity of OSS developers? What is the defect density of OSS code? • How long did it take to resolve problems?
Empirical Methods - 1 • Sources: • Mail archives for ~3 years CVS/BUGDB/developer discussions • Core group (about 12 people at any time) have CVS commit privileges • Output: CVS updates, BUGDB numbers • CVS update record (MR) • date, files touched, lines changed • author of the change • BUGDB tracking number (if it’s a problem fix) • BUGDB tracking number record • raiser, dates opened, closed • resolution module • related CVS updates
Empirical Methods - 2 • Research questions required change measures • Identified several “comparable” commercial projects • number of deltas within order of magnitude • developed over comparable period • all had high reliability requirements • Differences must be interpreted cautiously
Roles in Apache Development • Size of the development community • How many people wrote code for new Apache functionality? (no reference to problem report) • 249 people, 6092 submissions • How many people reported problems? • 458 people, 591 reports that resulted in code change • How many people repaired defects? • 182 people, 695 fixes • How was work distributed within the development community?
All code contributions Fixes only The cumulative distribution of contributions to the code base. Two Commercial projects (telecommunications)
Code Ownership • Was strict code ownership enforced on a file or module level? • No. Out of 42 “.c” files with more than 30 changes • 40 had at least two developers making more than 10% of the changes • 20 had at least four developers making more than 10% of the changes • Use other means of coordinating changes
Apache A C D B E KMR/ developer/ year .11 .03 .03 .09 .02 .06 KLOC/ developer/ year 4.3 38.6 11.7 6.1 5.4 10 Productivity • Compare sets of developers that produced 80% of the code in each application • A-E: similar-sized commercial projects
Measure Apache A C D E Post-release Defects/KLOCA 2.64 .11 0.1 0.7 0.1 Post-release Defects/KDelta 40.8 4.3 14 28 10 Post-feature test Defects/KLOCA 2.64 * 5.7 6.0 6.9 Post-feature test Defects/KDelta 40.8 * 164 196 256 Defect Density • Measures • post release and post-feature test • per KLOC added and per thousand Delta 1 26 24 2.6 3.8 1 9.5 1.5 5 2.9 1 .4 .4 .5 1 .2 .16 .25
Hypotheses Hypothesis 1: Open source developments will have a core of developers who control the code base. This core will be no larger than 10-15 people, and will create approximately 80% or more of the new functionality. Hypothesis 2: For projects that are so large that 10-15 developers cannot write 80% of the code in a reasonable time frame, informal coordination will not suffice. Hypothesis 3: In successful open source developments, a group larger by an order of magnitude than the core will repair defects, and a yet larger group (by another order of magnitude) will report problems. Hypothesis 4: Open source developments that have a strong core of developers but never achieve large numbers of contributors beyond that core will be able to create new functionality but will fail because of a lack of resources devoted to finding and repairing defects. Hypothesis 5: Defect density in open source releases will generally be lower than commercial code that has only been feature-tested, i.e., received a comparable level of testing. Hypothesis 6: In successful open source developments, the developers will also be users of the software. Hypothesis 7: OSS developments exhibit very rapid responses to customer problems.
Research QuestionsResource Allocation,Decision-Making • How do key developers decide where to allocate their resources? • User innovation model • Personal reputation model • Product needs model • How do individual motivations sum to give the development its trajectory? • Not quite a market, not quite a hierarchy, perhaps a network
Research Questions Understanding Current Limitations of OSS • Product structure, architecture – comprehension and collaboration • What does not get built? • Developers only meeting own needs? • Differences between developer/users and general users? • Effective ways of incorporating requirements of non-developer users? • Effects of scale • With larger scale, will coordination needs force adoption of “commercial” development techniques? • How to collaborate on “big” features? • Possible to increase participation by non-core developers?
Research Questions Adoption and Patronage • Commercial organizations need ways to assess risk of adopting open source • Patronage creates new forms of virtual organization • What effects on OSS culture, individual motivation, economic network? • How will competitive pressures, business motivations affect development? • Cause branching, fragmentation? • Evolve toward joint ventures, away from community ownership?