Towards a Theory of Programming

Towards a Theory of Programming

Roadmap • Early Concepts & Thoughts • Metrics • UML Process • Time is Money

Abstraction – Underground Map

The Babylonian Tower Principle programming languages still, and probably will ever fail to produce abstraction mechanisms suitable for large software modules.

Babylonian Tower (cont.) • Each language has a hierarchy of abstractions • E.g., Java has 5 levels: • Methods • Classes • Files • Packages • Jar files • Number of children (at any level) should be 7+/-2 • => Total number of methods in a manageable Java program should be < 105

Brooks: MMM • X9 Factor • Surgical Team • Adding people to a late project makes it later • People and Months are not interchangeable • Flow diagrams are obsolete • No silver bullet

Recap: High Quality Design(interim definition) • A design that • Minimizes the number of bugs • Minimizes the effort for adding new features

OOP Metrics • Chidamber & Kemerer 1994 • Some require full source code • Other require only relationship between classes • Motivation: Objective measurement of quality from the program itself

MCC: McCabe Cyclomatic complexity • # branches in the method • Typical value: ~5

NOC: Number Of Children • #direct subclasses • Typical value: unbounded • E.g.: java.util.Iterator • High value means • Reuse (good) • Coupling (bad) • Low value means the opposite

DIT: Depth of Inheritance Tree • #Ancestors • Typical value: 1-2 • High value means • Hard to understand • High cohesion

CBO: Coupling Between Objects • #Classes on which a class directly depends • Typical value: 30 • Low value means • Low coupling (good) • The code is using mostly primitive types (bad)

LCOM: Lack of Cohesion of Methods • For each class build an undirected graph • A node for each method, field • An edge between a method and a field if the method accesses the field • An edge between two methods if one of them is calling the other • LCOM = #Strongly connected components in this graph • Good value: 1

Shortcoming of Metrics • Easy to game the system • No correlation to Quality • Because quality cannot be measured • Not normalized

Package Cycles: FindBugs 0.72

Package Cycles: FindBugs 1.35

Package Cycles: Ant

Package Cycles: Antlr

Package Cycles: Summary I • Destructive rather than Constructive • Based on negative points • Blind spots • => The more blind spots you have the better your score is !?

Package Cycles: Summary II • Work only on statically typed languages • Negative points • => Dynamic languages will score very high

Hard Data: Defect Fixing Costs • Source: Code Complete II (McConnell)

Think Ahead

Approach: UML Process • Philosophy: • Software has a top-down structure • An optimal solution at stage n requires careful examination of all factors at stage n-1 • Human readable documents are less prone to errors than source code • A picture is worth a thousand words • Values • Measure twice cut once • Strive to prevent future defects • Principles • Top-down • Divide & Conquer via careful design of interfaces • Abstraction: Each stage concentrates on a specific kind of information • Practices • Analysis: Requirement gathering/Use cases • Architecture • Design • Implementation • Testing

Discussion: UML Process • Distinguishes between design and programming • Promoted formats for describing programs • Documents: SRS, TDD • Visual models: UML, BON • These representation abstract away statement-level details • Considered to have minor affect on overall quality • “Waterfall”: Measure twice cut once

UML Tools • Diagrams • Class • Part • State-chart • Activity • Sequence • Deployment • Use case • Code Generation from the UML model • Round-tripping • Highlight: • Multiple abstractions • Use case diagrams are a formal description of the informal notion of requirements

Iterative Waterfall • Motivation: Requirements change • Develop some of the program in UML process • All stages: Analysis, design, impl., … • Repeat for some other part of the program • Challenge: which part to choose in each iteration?

Design Documents: The Manufacturing Analogy • Customer need a new something • Medicine, air-plane, yogurt, mobile-phone, … • Experts prepare a rough sketch • Engineers prepare a detailed blue-print • Workers manufacture the product by executing the blue-print • Analogy • Engineers are the software designers • The blue print is the UML model/design documents • Workers are the programmers

Software is not Manufactured • A medicine will be reproduced billions of times • Each instance must be identical to the other • => There’s a need for a precise blue-print • Programmers need to “manufacture” a program only once • Reproduction is automatic (copying the executable) • A fully detailed blue print is not really needed • If the programmer understands the designer’s intent, a simple phone call is enough • Formalization may be a waste of time • The code is the blue print • The executable is the product

UML: The Building Architecture Analogy • Customer wants to build a house • Meets an architect, explain his needs • Architect prepares a model • customer approves • Engineer prepares a construction plan • Addresses lower level issues, e.g., drainage, structures, material • Contractor executes the construction plan

Software is not a Building • In Building, the costs of “undoing” are prohibitive • Hence “cut once, measure twice” • In software, “measure twice” may be more expensive the “cut twice” • The building model provides the customer with a faithful description of the building • In software, use case diagrams and req. documents do not come close to a faithful description of the final system • Customer cannot provide an effective feedback • Chances of developing the wrong program are high

Criticism on the UML Process • How do you know when to stop? • Even UML supporters agree it is not adequate for coding method and low level classes • => There is a level where a plain-old compiler is better • => Optimal results require a mixture • => How do you know where is the break-even point? • How do you which classes you need? • You start implementing in your head • Is it really more cost effective than implementing the real code? • Much easier to express classes than state • Tendency to yield design w/ many similar classes even if these differences can be easily expressed via state • Over-engineering – Build a lot of flexibility into software • To prevent going back to the early stages (see next slide) • Traceability

Over-Engineering • Simple: • A class that traverses (pre-order) a tree of files/folders • Computes total size of all files • Over-engineering • Compute something else • Iterate in a different order • Ignore certain files • Iterate over something other than files • Iterate over something that is not hierarchical • YAGNI: You Are not Going to Need It

The Mathematics of YAGNI • A tree of height 3, degree 3 • Each third child is redundant (incl. subtree) • Total nodes: 13 • Redundant nodes: 1+1+4=6 (46%) • A tree of height h, degree d • Total nodes: s(h) = d*s(h-1) + 1, s(0) = 0 • Redundant nodes: r(h) = s(h-1) + (d-1)*r(h-1) • => r(h) is O(s(h))

The Agile Manifesto …we have come to value: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan

The Importance of Time • A hypothetical programming task • Approach one: 5 days • Approach two: 1 day • Same interface • Code is not well structured (high-coupling, low-coherency) • First approach • Minimal effort: 5 days • Second approach • Effort: 1 (best case) – 6 (worst case) days • Prefer the second approach • Tests will stay • Other team members can work on their parts • Sad scenario: you lost 1 day • Happy scenario: you earned at least 4 days • So time is a key factor. Can we estimate development time?

Time Estimates: Physician Appointments • My physician has an accurate schedule for at least in advance • Method: Compute time per appointment • Evidence based estimation • Based on gathered statistical data, law of large numbers • Properties of appointments • Countable • Identifiable end • Abundance

Time Estimates: A Software Project • Time per class? • Not countable • Time per sub-system? • Not abundant • Features? • Countable (breakdown of the big task) • Identifiable end (write tests) • Abundant (by definition)

Burn Charts • Time is important • (As shown in previous slide) • So, let’s describe our progress vs. time • Vertical axis: tasks completed • Horizontal axis: time line • Two variants: burn-up, burn-down

Burn Down

Burn Up

Burn Up Example

Quality in Software(new definition) • A high-quality software is a software whose burn curve is linear • Similar to Big-O notation of algorithms • Does not distinguish between two linear curves • Differences in domain, languages, … • States that a flattening is the #1 risk • Can be experienced even in student assignments • Result oriented

Summary • Time to completion is a key factor • Time estimation by features is practical • Burn up charts show progress • Quality: Linear burn curve

Towards a Theory of Programming

Towards a Theory of Programming

Presentation Transcript

Towards a Constructive Theory of Networked Interactions

Towards a useful theory of language

TOWARDS A DYNAMIC THEORY OF STRATEGY

Notes towards a Computational Theory of Consciousness

Towards a mid range theory of implementation

Towards a Science of Parallel Programming

Towards a Critique of Developmentalist: Dependency Theory

Towards a theory of innovation in services

Chapter 4: Towards a Theory of Intelligence

Towards a Theory of Onion Routing

Notes towards a theory of formative assessment

Towards a Theory of Cache-Efficient Algorithms

Towards a Theory of Events

Towards a Theory of Everything

Towards a General Theory of Local Actions

TOWARDS A CONTROL THEORY OF ATTENTION

Towards a Constructive Theory of Networked Interactions

Towards a Theory of Service Improvisation Competence

Towards a Theory of Everything

Towards a Theory of Digital Preservation

TOWARDS A CONTROL THEORY OF ATTENTION