Does Distributed Development Affect Software Quality???? An Empirical Case Study of Windows Vista

Does Distributed Development Affect Software Quality???? An Empirical Case Study of Windows Vista Christian Bird, Premkumar Devanbu, Harald Gall , Brendan Murphy In ICSE ’09: Proceedings of the 2009 IEEE 31st International Conference on Software Engineering

Distributed Software development is more riskier and challenging than collocated development • Reasons for distributed software development: skill set availability, acquisitions, government restrictions, increased code size, cost and complexity • Challenges faced : delayed feedback, restricted communication, less shared project awareness, difficulty of synchronous communication, inconsistent development and build environments • Post – Release Failure: The inability of a system or a component to perform its required functions within specified performance requirements • Individual executable and libraries are referred to as Binaries

Key Questions • Who or What is distributed at What level?? • Are people or the artifacts distributed?? • Are people dispersed individually or dispersed in groups?? • Way the developers and other entities are distributed?? • Distribution can be across geographical, organizational, temporal, or stakeholder boundaries Study involves • Distributed development at multiple levels of separation • Large Scale S/W development – thousands of binaries and developers • Complexity and maintenance characteristics of the distributed and collocated binaries • All sites involved in the study are part of the same company

Popular Beliefs Difficulties in Distributed Development • Communication • Coordination Breakdowns • Diversity in Operating Environments • Distance reduces team cohesion • Organizational and National Cultural barriers Testable Hypothesis • Binaries that are developed by teams of engineers that are distributed will have more post-release failures than developed by collocated engineers • Binaries that are distributed will be less complex and have fewer dependencies

Effects on Bug Resolution • The time to Resolution of Modification Requests (MRs) • For a single site – 5 days • For Distributed site – 12.7 days • On controlling factors like : No: of People working on MR, severity, size of the change, negative effect of distributed development was less significant • Distributed development indirectly introduces delay due to correlated factors such as team size and breadth of changes required • Feasible decisions must for the project • Testable hypotheses about productivity • People who are assigned work from many sources have lower productivity • MRs that require work in multiple modules take more time

Effects on Quality and Productivity • Relationship between Dispersion, development productivity and conformance quality • Inference: Projects that had more dispersion had lower levels of productivity and conformance quality • Key Actions necessary for success with global development: - Distribute entire things for entire life cycle - Plan to accommodate time and distance • Reduce intensive collaborations • Reduce national and organizational cultural distance • Reduce temporal distance

Methods and Analysis

Data Collection • Windows Vista • 3300 binaries • Tens of MLOC • 59 buildings 21 campuses in Asia, Europe, and North America • Data Collection Focus • Code Quality • Geographical location

Geographical Location based Separation • Hierarchy • Building • Cafeteria • Campus • Locality • Continent • World • Assignment to level of hierarchy • To lowest possible level which covers a threshold percentage of commits • Threshold percentage • 75% of commits made

Geographical Location based Separation cont’d

Experiments and Results • Objective • Test the hypothesis that, there will be difference in code quality between distributed and collocated binary • Measure of Quality: Number of post release failures per binary

Results cont’d • Linear Regression Analysis to examine the effect of distributed development on the number of failures: • 9.2% increase in failures when distributed • 4.6% increase in failures when distributed but team-size is controlled • Result1: Increase of failures for geographic distribution is small (but not statistically insignificant) • Result2: Effect of geographic separation can be controlled to some extent by controlling team size

Analysis of Results • Arrived Conclusion “In the context in which Windows Vista was developed, teams that were distributed wrote code that had virtually same number of post-release failures as those that were collocated” • Factors that may be responsible • Difference between collocated and distributed binary? • Difference may come from • Size and complexity, Code Churn, Test Coverage, Dependencies, People • Finding: No significant difference except a small (statistically not significant) difference in team size metric.

How Is It Possible To Conduct Distributed Development Without Hampering Quality ? • Relationship between sites All sites work together, same pay , benefits • Cultural Barriers Engineers visit each other, work together, builds trust • Communication Maintain core working hours • Consistent Use Of Tools same source code management, documentation, defect tracking tools

Continued

Conclusion • Distributed development can work for large software projects • Organizationally compact geographically distributed project is better than a geographically local organizationally distributed project • Microsoft Vista is an example which negates the popular belief about distributed development

Thank You ! Questions !

Does Distributed Development Affect Software Quality???? An Empirical Case Study of Windows Vista

Does Distributed Development Affect Software Quality???? An Empirical Case Study of Windows Vista

Presentation Transcript