360 likes | 450 Views
“As part of our experience with the production of software for a large telecommunications system, we have observed a nearly unanimous feeling among developers of the software that the code degrades through time and maintenance becomes increasingly difficult and expensive.”. Eick et al, 1998.
E N D
“As part of our experience with the production of software for a large telecommunications system, we have observed a nearly unanimous feeling among developers of the software that the code degrades through time and maintenance becomes increasingly difficult and expensive.” Eick et al, 1998 MSc Software MaintenanceMS Viðhald hugbúnaðar Fyrirlestrar 17 & 18 Does Code Decay? Dr Andy Brooks
Case Study Dæmisaga Reference Does Code Decay? Assessing the Evidence from Change Management Data, Stephen G Eick, Todd L Graves, Alan F Karr, J S Marron, and Audris Mockus, NISS-TR-81 (1998), National Institute of Statistical Sciences, 19 T. W. Alexander Drive, PO Box 14006, Research Triangle Park, NC 27709-4006, USAhttp://www.niss.org/technicalreports/tr81.pdf “Whether this code decay is real, how it can be characterized, and the extent to which it matters are the questions we address in this paper.” Eick et al, 1998 Dr Andy Brooks
Previous Work “Early investigations of aging in large software systems by Belady and Lehman [2], [3], [4] reported the near impossibility of adding new code to an aged system without introducing faults.” Eick et al, 1998 Dr Andy Brooks
Access To Large Data Set • Entire change management history of a 15 year old, real-time, software system for telephone switches: • 100,000,000 lines of code • C, C++, proprietary state description language • 100,000,000 lines of header and make files • Some 50 major subsystems and 5,000 modules • Here, a module is a directory containing several files. • Each release is some 20,000,000 lines of code • 10,000 developers have been involved. Dr Andy Brooks
Categories Of Change • Adaptive • new functionality (e.g. caller ID) • adaptions to new hardware or other changes in environment • Corrective • fixing faults • Perfective • improve maintainability of software • reengineering (refactoring) Dr Andy Brooks
Change Process • A new feature (e.g. call waiting) involves hundreds of Initial Modification Requests (IMRs). • Each IMR results in a number of Modification Requests (MRs) . • Developers open MRs, perform the changes and make limited checks that the changes are satisfactory. • Inspections and integration and system tests follow. • An editing change to a single file is captured as a delta. • Lines added and deleted are tracked separately. • Line edits involve first deletion, then addition. Dr Andy Brooks
Data Tracked By Version Management System 89 fields including priority, date opened, date closed problem solution (change & reasons) Dr Andy Brooks
What files were changed, How many modules, files, and lines were affected?... Answering Questions About Change Data D - directly from version management database A - by aggregation over constituent parts D* - problematic aspects Dr Andy Brooks
What Is Code Decay? • “Code is decayed if it is more difficult to change than it used to be.” • But increases in difficulty of making changes may be as a result of an increase in the inherent difficulty of requested changes. • Decayed code does not mean that the software fails to meet current requirements. • Decayed code means it is difficult to add new functionality or make other changes. Dr Andy Brooks
What Is Code Decay? • Decayed code may have increased value. • The changes that have caused the decay mean more functionality for the customer. • A code unit can decay as a result of changes elsewhere in the software. • A code unit can be inherently complex and to attribute the difficulty of making a change to decay can be misleading. Andy says: a complex application will result in complex software. Dr Andy Brooks
Individual Ability • Making changes is less difficult for a more more able software maintainer. • Making changes is more difficult for a junior software maintainer. • “A definitive adjustment for developer ability has not been devised and usually we must relegate developer variability to ‘noise’ terms in our models.” Dr Andy Brooks
Causes Of Decay • Inappropriate architecture • changes have wide scope • Violation of original design principles • fixed phone -> mobile/fixed phone • Imprecise requirements • ‘crisp code’ not produced • Time Pressure • short-cuts, sloppy code, kludges • limited code understanding Dr Andy Brooks
Causes Of Decay • Inadequate programming tools • Organizational Environment • excessive staff turnover • developers fail to communicate properly • Programmer variability • weak programmers may not understand complex code written by more able colleagues • Inadequate change process • missing version control • handling changes in parallel Dr Andy Brooks
Sjúkdómseinkenn batahorfur Medical Metaphor • The software is a patient with a disease called code decay. • What are the causes of the disease? • changes made to the code • What are the disease symptoms? • What are the prognoses if you have the disease? • What are the relevant risk factors for the disease? Andy says: I hope you do not smoke. Dr Andy Brooks
Symptoms Of Code Decay • Excessively complex code • useful metrics: • standard software complexity metrics? • # loops & conditionals enclosing a line? • A history of frequent changes • also known as ‘code churn’ • A history of faults • fault fixes themselves may not be examples of good programming Dr Andy Brooks
Symptoms Of Code Decay • Widely dispersed changes • Changes to well-engineered code tend to be local (within a class). • Kludges • Changes made knowing it could have been done more elegantly or more efficiently. • Numerous Interfaces (entry points) • Possible side-effects of changes elsewhere. Dr Andy Brooks
Risk Factors For Code Decay- Risk factors increase chance of decay or worsen its effect. • Size of module m • NCSL(m), number of noncommentary source lines • Age of Code • but very stable code might never be changed • variability of age within a code unit may be the key characteristic • Inherent Complexity • real-time software is more likely to decay • Organizational Churn • company knowledge base degraded • inexperienced developers make changes Dr Andy Brooks
Risk Factors For Code Decay- Risk factors increase chance of decay or worsen its effect. • Ported or Reused Code • Ariane 5 crash was caused by reused code from Ariane 4 • http://edition.cnn.com/WORLD/9606/04/rocket.explode/ • Requirements Load • very many requirements are difficult to understand and implement • Inexperienced Developers • lack of knowledge • lack of understanding of system architecture 3-tier? Dr Andy Brooks
Code Decay Indices (CDIs) notation • c for changes (MRs) • l for lines of code • f for files • m for modules • c->m means ‘c touches m’ • Part of m is changed by c. • 1{A} • equals 1 if event A occurs • equals 0 otherwise Dr Andy Brooks
Code Decay Indices (CDIs) notation • DELTAS(c) • number of deltas associated with c • ADD(c) • number of lines added by c • DEL(c) • number of lines deleted by c • DATE(c) • date on which c is completed • INT(c) • the calendar time required to implement c • DEV(c) • number of developers implementing c Dr Andy Brooks
Historical Count Of Changes • The number of changes to a module m in the time interval I: • With |I| indicating length of time interval I, the frequency of changes is: Dr Andy Brooks
Span Of Changes Scope of Changes • The span is the number of files touched by a change: • Changes touching more files are more difficult because: • The maintainer might have to spend time understanding unfamiliar files. • Code interfaces might have to be modified. Dr Andy Brooks
Size • The size of a module m is NCSL(m) summing over all files f in m. • “most standard software complexity metrics are almost perfectly correlated with NCSL in our data sets” Dr Andy Brooks
Age • AGE(m) • the average age of constituent lines • Variability in line ages is also of interest • The tool SeeSoft produces a visualization of the variability in line ages: • files represented by boxes • lengths of lines in the boxes proportional to the number of characters • files that change little have mostly a single colour • files that have been changed a lot are multi-colored Dr Andy Brooks
SeeSoft View Of One Module Dr Andy Brooks
SubSystem Under Analysis • 100 modules • 2,500 files • 6,000 IMRs • 27,000 MRs • 130,000 deltas • 500 different login names made code changes to the subsystem X 100 Dr Andy Brooks
Temporal Behavior Of The Span Of Changes (different window widths) • Probabilities that a change will touch more than one file doubles from less than 2% in 1989 to more than 5% in 1996. • Ripples in the high resolution smooth are not statistically significant. initial development 96 89 Date Dr Andy Brooks
Breakdown In Modularity? • Alone, the increase in span of changes does not imply a breakdown in the modularity of the subsystem. • The increase could simply reflect the growth of the subsystem and changes with a wide span need not cross module boundaries. c c Dr Andy Brooks
Network Visualization Tool NicheWorks • Each tadpole shape corresponds to a module. • The tadpole tail indicates the picture at the end of the previous year. • Pairs of modules are placed nearby if they have been changed together as part of the same MRs a large number of times. Dr Andy Brooks
NicheWorks View Of The SubSystem Modules 1988 1989 1996 The architecture that separated the functionally of two clusters of modules is breaking down. Dr Andy Brooks
Alternative Interpretation implement caller-ID provide an extra area-code digit The inherent difficulty of the desired changes could have been increasing. The modification request data are not examined independently from this perspective. Dr Andy Brooks
Prediction Of Faults Quality Prognosis • The best model derived from the data predicts numbers of faults using numbers of changes to the module in the past. • Large recent changes add most to the fault potential. • Parameter 0.75 was determined by statistical analysis. • The number of times a module has been changed is a better predictor than size. • The number of developers working on a module had no effect on fault potential. Dr Andy Brooks
Prediction Of Effort Effort Prognosis • “Can the effort required to implement changes be predicted from symptoms and risk factors for decay?” • Effort data, available only at the feature level, displayed extreme variability, so suggestive results only: • A dependency on FILES(c) was discovered supporting the idea that the span of changes is a symptom of decay. • Some changes involved a small number of deltas but required close to maximum effort. Dr Andy Brooks
Summary Eick at al Four analyses demonstrate: • “The increase over time in the number of files touched per change to the code. • The decline in modularity of a subsystem of the code, as measured by changes touching multiple modules. • Contributions of several factors (notably, frequency and recency of change) to fault rates in modules of code, and • That span and size of changes are important predictors (at the feature level) of the effort to implement a change.” Dr Andy Brooks
Summary Eick at al • The system studied showed no evidence of dramatic, widespread decay: • In seven years, the probability of a change touching more than 1 file increased only from 2% to 5%. • The architecture that separated the functionally of two clusters of modules is breaking down. • Can code decay prove fatal? • “there are anecdotal reports of systems that have reached a state from which further change is not possible” Dr Andy Brooks
Modification Request Difficulty • Analysing the nature of the modification requests over time was not done and alternative interpretations of the data set cannot be rule out. • How can you measure the inherent difficulty of a modification request? • By the span of changes? • By the complexity of the textual description & justification? • The temporal behaviour of the span of changes could be due to the inherent difficulty of modification requests increasing with time. Andy says: we do not know for this data. Dr Andy Brooks