440 likes | 458 Views
This article discusses foundational concepts in programming, including abstraction, hierarchy of abstractions, OOP metrics, and the UML process. It explores the challenges and limitations of current programming languages and proposes a roadmap for developing abstraction mechanisms suitable for large software modules.
E N D
Roadmap • Early Concepts & Thoughts • Metrics • UML Process • Time is Money
The Babylonian Tower Principle programming languages still, and probably will ever fail to produce abstraction mechanisms suitable for large software modules.
Babylonian Tower (cont.) • Each language has a hierarchy of abstractions • E.g., Java has 5 levels: • Methods • Classes • Files • Packages • Jar files • Number of children (at any level) should be 7+/-2 • => Total number of methods in a manageable Java program should be < 105
Brooks: MMM • X9 Factor • Surgical Team • Adding people to a late project makes it later • People and Months are not interchangeable • Flow diagrams are obsolete • No silver bullet
Recap: High Quality Design(interim definition) • A design that • Minimizes the number of bugs • Minimizes the effort for adding new features
OOP Metrics • Chidamber & Kemerer 1994 • Some require full source code • Other require only relationship between classes • Motivation: Objective measurement of quality from the program itself
MCC: McCabe Cyclomatic complexity • # branches in the method • Typical value: ~5
NOC: Number Of Children • #direct subclasses • Typical value: unbounded • E.g.: java.util.Iterator • High value means • Reuse (good) • Coupling (bad) • Low value means the opposite
DIT: Depth of Inheritance Tree • #Ancestors • Typical value: 1-2 • High value means • Hard to understand • High cohesion
CBO: Coupling Between Objects • #Classes on which a class directly depends • Typical value: 30 • Low value means • Low coupling (good) • The code is using mostly primitive types (bad)
LCOM: Lack of Cohesion of Methods • For each class build an undirected graph • A node for each method, field • An edge between a method and a field if the method accesses the field • An edge between two methods if one of them is calling the other • LCOM = #Strongly connected components in this graph • Good value: 1
Shortcoming of Metrics • Easy to game the system • No correlation to Quality • Because quality cannot be measured • Not normalized
Package Cycles: Summary I • Destructive rather than Constructive • Based on negative points • Blind spots • => The more blind spots you have the better your score is !?
Package Cycles: Summary II • Work only on statically typed languages • Negative points • => Dynamic languages will score very high
Hard Data: Defect Fixing Costs • Source: Code Complete II (McConnell)
Approach: UML Process • Philosophy: • Software has a top-down structure • An optimal solution at stage n requires careful examination of all factors at stage n-1 • Human readable documents are less prone to errors than source code • A picture is worth a thousand words • Values • Measure twice cut once • Strive to prevent future defects • Principles • Top-down • Divide & Conquer via careful design of interfaces • Abstraction: Each stage concentrates on a specific kind of information • Practices • Analysis: Requirement gathering/Use cases • Architecture • Design • Implementation • Testing
Discussion: UML Process • Distinguishes between design and programming • Promoted formats for describing programs • Documents: SRS, TDD • Visual models: UML, BON • These representation abstract away statement-level details • Considered to have minor affect on overall quality • “Waterfall”: Measure twice cut once
UML Tools • Diagrams • Class • Part • State-chart • Activity • Sequence • Deployment • Use case • Code Generation from the UML model • Round-tripping • Highlight: • Multiple abstractions • Use case diagrams are a formal description of the informal notion of requirements
Iterative Waterfall • Motivation: Requirements change • Develop some of the program in UML process • All stages: Analysis, design, impl., … • Repeat for some other part of the program • Challenge: which part to choose in each iteration?
Design Documents: The Manufacturing Analogy • Customer need a new something • Medicine, air-plane, yogurt, mobile-phone, … • Experts prepare a rough sketch • Engineers prepare a detailed blue-print • Workers manufacture the product by executing the blue-print • Analogy • Engineers are the software designers • The blue print is the UML model/design documents • Workers are the programmers
Software is not Manufactured • A medicine will be reproduced billions of times • Each instance must be identical to the other • => There’s a need for a precise blue-print • Programmers need to “manufacture” a program only once • Reproduction is automatic (copying the executable) • A fully detailed blue print is not really needed • If the programmer understands the designer’s intent, a simple phone call is enough • Formalization may be a waste of time • The code is the blue print • The executable is the product
UML: The Building Architecture Analogy • Customer wants to build a house • Meets an architect, explain his needs • Architect prepares a model • customer approves • Engineer prepares a construction plan • Addresses lower level issues, e.g., drainage, structures, material • Contractor executes the construction plan
Software is not a Building • In Building, the costs of “undoing” are prohibitive • Hence “cut once, measure twice” • In software, “measure twice” may be more expensive the “cut twice” • The building model provides the customer with a faithful description of the building • In software, use case diagrams and req. documents do not come close to a faithful description of the final system • Customer cannot provide an effective feedback • Chances of developing the wrong program are high
Criticism on the UML Process • How do you know when to stop? • Even UML supporters agree it is not adequate for coding method and low level classes • => There is a level where a plain-old compiler is better • => Optimal results require a mixture • => How do you know where is the break-even point? • How do you which classes you need? • You start implementing in your head • Is it really more cost effective than implementing the real code? • Much easier to express classes than state • Tendency to yield design w/ many similar classes even if these differences can be easily expressed via state • Over-engineering – Build a lot of flexibility into software • To prevent going back to the early stages (see next slide) • Traceability
Over-Engineering • Simple: • A class that traverses (pre-order) a tree of files/folders • Computes total size of all files • Over-engineering • Compute something else • Iterate in a different order • Ignore certain files • Iterate over something other than files • Iterate over something that is not hierarchical • YAGNI: You Are not Going to Need It
The Mathematics of YAGNI • A tree of height 3, degree 3 • Each third child is redundant (incl. subtree) • Total nodes: 13 • Redundant nodes: 1+1+4=6 (46%) • A tree of height h, degree d • Total nodes: s(h) = d*s(h-1) + 1, s(0) = 0 • Redundant nodes: r(h) = s(h-1) + (d-1)*r(h-1) • => r(h) is O(s(h))
The Agile Manifesto …we have come to value: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan
The Importance of Time • A hypothetical programming task • Approach one: 5 days • Approach two: 1 day • Same interface • Code is not well structured (high-coupling, low-coherency) • First approach • Minimal effort: 5 days • Second approach • Effort: 1 (best case) – 6 (worst case) days • Prefer the second approach • Tests will stay • Other team members can work on their parts • Sad scenario: you lost 1 day • Happy scenario: you earned at least 4 days • So time is a key factor. Can we estimate development time?
Time Estimates: Physician Appointments • My physician has an accurate schedule for at least in advance • Method: Compute time per appointment • Evidence based estimation • Based on gathered statistical data, law of large numbers • Properties of appointments • Countable • Identifiable end • Abundance
Time Estimates: A Software Project • Time per class? • Not countable • Time per sub-system? • Not abundant • Features? • Countable (breakdown of the big task) • Identifiable end (write tests) • Abundant (by definition)
Burn Charts • Time is important • (As shown in previous slide) • So, let’s describe our progress vs. time • Vertical axis: tasks completed • Horizontal axis: time line • Two variants: burn-up, burn-down
Quality in Software(new definition) • A high-quality software is a software whose burn curve is linear • Similar to Big-O notation of algorithms • Does not distinguish between two linear curves • Differences in domain, languages, … • States that a flattening is the #1 risk • Can be experienced even in student assignments • Result oriented
Summary • Time to completion is a key factor • Time estimation by features is practical • Burn up charts show progress • Quality: Linear burn curve