Software Engineering Evidence

Software Engineering Evidence Dieter Rombach (rombach@informatik.uni-kl.de) Technical University of Kaiserslautern Computer Science Department Software Engineering Chair Kaiserslautern, Germany wwwagse.informatik.uni-kl.de Fraunhofer Institute for Experimental Software Engineering (IESE) Kaiserslautern, Germany www.iese.fhg.de

Contents • Experimental Software Engineering • Software Engineering Evidence • Towards a Theory of Software Engineering Evidence • Empirical Methods • Existing Body of Knowledge • Experimental Software Engineering in Kaiserslautern • Practical Examples • Agenda for Research, Tech Transfer & Teaching • Outlook

Experimental Software Engineering (1 of 3) • Computer Science is one of the scientific base disciplines for the “engineering of large (software) systems” Systems Engineering Software Engineering Mechanical Engineering Psychology Physics Economics Computer Science … Mathematics Mathematics

Experimental Software Engineering (2 of 3) G== f(P,C) • All research in computer science & software engineering should be application oriented • basic research has a longer time-scale than applied research • non-application oriented research is only justified in natural sciences! • application needs (in engineering) are defined via • goals (e.g., reliability of >= 0.99) • engineering constraints (e.g., staff experience levels, development technology such as OO/UML) Which testing technique promises/guarantees at least95 defect detection effectivenessforlarge OO/UML-basedsystems?

Experimental Software Engineering (3 of 3) • Software Engineering comprises • (formal) methods (e.g., modeling techniques, description languages) • system technology (e.g., architecture, modularization, OO, product lines) • process technology (e.g., life-cyle models, processes, management, measurement, organization, planning QS) • empiricism (e.g., experimentation, experience capture, experience reuse) Experimental SE • Experimental Software Engineering • recognizes the nature of our field • (engineering in a human-based design domain) • applies empirical studies in order to • determine engineering effects (f) • in such a design domain Formal Methods Process Technology System Theory Empiricism

Software Engineering Evidence (1 of 2) • Empirical studies aim to capture quantitative evidence regarding (P) • product characteristics (definition, behavior) • “What is the complexity of a product?” • “What is the performance of a system?” • process characteristics (definition, behavior) • “What is the inherent degree of parallelism?” • “How much effort does it take?” • process-product relationships • “How does design complexity affect test effort?” • Issues • How deterministic are studies? • How easy/hard is it to test results via replication? G== f(P,C)

G== f(P,C) Software Engineering Evidence • Traditional (quantitative) empirical evidence • controlled experiments (variation in C is controlled) • case studies (C is a constant – reflecting some environment) • Questionnaires, Action Research, …. (mostly qualitative) • Expert consensus (like in medicine) Statistical Significance increases Practical Acceptance increases Scientists (aiming at testable cause-effect relations) prefer controlled expriments! Practitioners (aiming at low-risk technology infusion) prefer case studies & expert consensus!

Towards a Theory of SE Evidence (1 of 3) <var> • Elements of Evidence (G== f(P,C)) • Review Technique P has effectiveness (80%, 50%) for (experienced, inexperienced) developers • Life cycle model P consumes resources (effort distribution model 1/model 2) for (large/small x functional/OO design based) projects • G: any quality, cost & time model (e.g., reliability, effectiveness) • P: any SE process, method or technique (e.g., testing, design) • C: any project environment characteristic that defines the validity of G • f: mapping from (P,C) onto G • ==: mapping is defined empirically (because lack of physical laws!) • <var>: some measure of certainty (e.g., +- X%) All evidence should be packaged like (G x P x C ) in order to be accessible for engineering decisions! This is also important for scientific publications!

Towards a Theory of SE Evidence (2 of 3) • Aggregation (vertical: P constant) • to increase significance (i.e. reduce <var>) • to increase generality, i.e. variation of C := C1 x C2 x C3 x C4 by increasing the range of each Ci • Significance increase • experiment replication (e.g., inspection area) • formal aggregation depends on measure for significance (e.g. # replications are incremented) • Variation increase • experiment variation (e.g., applications, experiences, …) • complexity: simple coverage for 5 variables with 4 values each requires “4 to the power of 5” = 1024 experiments??? • new hidden context variables HC appear in meta analysis! • E.g., (G1, P, C) & (G2, P, C)  (G1/G2, P, C x (HV)) Complexity Problem (C coverage): Maturing (observations -> laws -> theories) based on empirical studies ALLONE takes too long! Expert consensus „when oberservations are considered laws“ is necessary!

Towards a Theory of SE Evidence (3 of 3) • Aggregation (horizontal: P increases) • to scale up to “larger” processes P (e.g., Cleanroom software development process) • perform controlled experiments in “key elements” (e.g., unit inspections vs. testing) • perform integration case studies • acceptance of scaled-up evidence must be confirmed by expert consensus (organization or community) Complexity Problem (P size): Performing controlled experiments on large processes is practically impossible! Expert consensus is required to accept evidence!

Empirical Methods (1 of 3) • Science in general involves • modeling of software product & process artifacts • empirical validation of hypotheses regarding their characteristics & behavior in testable form • Empirical foundation includes methods for • relating goals to measurements (GQM) • piggy-bagging empirical studies on real projects (QIP) • organizing empirical observations for reuse (EF) • specific activities such as experimental design, data analysis • importance of combining quantitative & qualitative analysis There exists a comprehensive body of empirical methods! (ISERN Workshops & ISESE Conference in Los Angeles, week of 16 August 2004) No excuse for not applying it!

Generic Form of Empirical SE Contributions: For non-deterministic processes Function f can only be determined empirically Variance of function f depends on sufficient # data points inclusion of relevant C stability of non-included C important use of qualitative analysis: assure that there are no non-identified context factors Variance may increase when repeated in/included with new environment reason: hidden context factor (i.e., factor which was constant in one or both environents) becomes influential consequence: New empirical cycle OR keep two different domains (i.e. identical recommendation exists for product lines!) Empirical Methods (2 of 3) <var> G== f(P,C,HC) • What is our equivalent to physical laws? • congitive „laws“? • empirically validated? • always context dependent? • G: any quality aspect such as • reliability or maintainability • P: process, technique, method, • tool under investigation • C: context factors such as • experience, lifecycle model • HC: hidden context factors that only • show up at certain levels of • generalization

Empirical Methods (3 of 3) • Physics offers laws for electrical eng. • precise • not circumventable • Computer Science & …. offer “laws” for SE • empirically precise • circumventable (e.g., you may increase the complexity of any system and it still may work! • is this really true? • not if one includes evolution! • what defines bounds? • models that capture the negative consequences if you exceed complexity bounds Physical laws Cognitive Laws (Com <= Co Test_Effort <= To ! Test_Effort > To)

Existing Body of Knowledge • There exists more knowledge than we typically recognize • mostly in terms of context-specific empirical observations • rarely in terms of generalized “laws” • There exist already more empirical “laws” than we typically recognize • book (Endres/Rombach, Addison, 2003) • inspections • design principles • More studies need to be done • repeat (with variation) • generalize

Fraunhofer IESE Series • Editorial Board includes • Dave Parnas • Dieter Rombach • Ian Sommerville • Marv Zelkowitz Handbook capturing existing body of knowledge Students can learn about existing body of knowledge Practitioners can avoid negligance of due dilligance Additions are welcome for next edition of book

Experimental Software Engineering in Kaiserslautern (1 of 8) • Empirical study driven software engineering research requires a laboratory environment for • in vitro: controlled student experiments • In vivo: case studies on real industry projects • “Mother” of all lab environments • NASA SEL (GSFC, CSC & UMD) • combination of stake-holders • long-standing trusting relationship • sustained accomplishments • reduction of cost (50%) & increase in quality (no slipping defects) • sustained via empirically modeled “laws” • regular use of “formal” methods (e.g., functional semantics in Cleanroom) • overcoming of development “factoids” (e.g., testing is best defect detection approach) None of the SEL success stories would have happened without empirical studies!

Experimental Software Engineering in Kaiserslautern (2 of 8) Large Comp‘s SME‘s Fraunhofer IESE 1 2 RL (Bosch) RL State 1 2 SW&Sys.Eng Univ. Kaiserslautern DFG Res. Institutes

Experimental Software Engineering in Kaiserslautern (3 of 8) • Experimentation (EXP) (Experimental designs, QIP/GQM/EF, Analysis techniques) • Requirements and Usability Engineering (RUE)(Requirements Engineering, Usability Engineering) • Component-based Software Engineering (CBE)(Component-based software development mainly in the field of embedded and realtime systems) • Software Product Lines (SPL) (Architecture recovery, Architecture development & evaluation, Scoping & modeling of domains & decisions)

ESE in Kaiserslautern (IESE, Research Competences) (4 of 8) • IT Security & Safety (ITS)(Security Assessments, EF for Security, Automated checking of infrastructures) • Quality and Process Engineering (QPE) (Goal-oriented measurement GQM, Descriptive process modeling, Goal-oriented product/process assessment, Subcontracting, COTS Acquisition, Risk Management) • Systematic Learning & Improvement (SLI)(QIP-based SPI planning & setup, EF buildup and Organizational Learning) • Certifiable Education & Training in SE (CET) (Evaluation and Certification, Technology-enabled Learning, Simulation-based learning and decision support)

ESE in KL (IESE, Maintain intellectual Control over System Complexity) (5 of 8) Cost Avoiding deterioration of system complexity over time yields linear cost curve! SE principles & technical engineering processes : actualcost curve : desired cost curve Product Functions (Releases) over time 1 2 3

ESE in KL (IESE, Maximize Software Reuse) (6 of 8) Cost Exploiting reuse potential with proper apriori scoping (tool support) yields constant cost curve! Product Line Reuse : PL cost curve : desired cost curve Product Functions (Releases) over time 1 2 3

ESE in KL (IESE, Achieve Predictability for Software Development) (7 of 8) Cost Measurement-based project management models yield small bandwidth prediction capabilities! X Management Principles X X X X X X : maximum oscillation X : desired cost curve X Product Functions (Releases) over time (Size) 1 2 3

ESE in KL (IESE, Business Areas) (8 of 8)

Practical Examples (Inspections @ NASA) (1 of 4) • NASA SEL Experience (see Basili, JSS, 1997) • stepwise abstraction code reading vs. testing (Basili/Selby, TSE, 1987) • controlled experiment at UMD & NASA/CSC • effectiveness & cost (SAR > testing) • self-assessment (SAR > testing) • stepwise abstraction code reading in regular SEL project • case study at NASA/CSC • SAR did not show any benefits • diagnosis: People did stewise abstraction code reading not as well as they should have as theybelieved that testing would make up for their mistakes • Cleanroom vs. standard SEL software development • controlled experiment at UMD • more effective application of reading, less effort and more schedule adherence • stepwise abstraction code reading in SEL Cleanroom projects • case study(ies) at NASA/SEL • improved failure rates (- 25%) and productivity (+30%)

NASA/SEL: code inspect. are effective (- 30/50% code def) NASA/SEL: inspections were not lived! Allianz: requirem. inspections are effective (- 50 % def) Industrial Success Stories ……. Research - Studies - Methods Defect Cost study (Boehm) Cleanroom Study (Basili et al) Selby Study (inspection > testing) PBR studies (MD & IESE) Functional Semantics (Mills) Stepwise Ab- straction Reading (Mills, Basili) Cleanroom Process (Mills et al) Perspective based Read. (Basili et al)

Practical Examples (Inspections @ IESE) (3 of 4) • Further IESE work on inspections • investigation of effects in OO/UML environment (Laitenberger) • defined PBR for OO/UML (packaging of reading unit across views • controlled experiments • students atUKL (SE class) • PBR of requirements spec (UML) vs CBR • effectiveness & cost (PBR > CBR) • replication of existing (see NASA/SEL) studies in varying contexts (application domains, technology domains) • variation of existing studies to address new questions • optimal effort for preparation phase in inspection process (exists as demonstrated at Bosch; is used to manage inspection process) • Industrial relevance • helped establish inspections with sustained success in several companies (e.g., Allianz, Bosch) • focus on inspections (with measurement-based feedback) matures development organizations (e.g., Bosch unit with inspections went from CMM1 to CMM 3 in one step!)

Practical Examples (Design & Documentation @ IESE) (4 of 4) • IESE Studies on OO/UML (Briand, Bunse, Daly) • operationalized good design principles such as coupling, information hinding & cohesion • hypotheses: • #1: “Good” OO designs are better understood • measured by the correctness of answers to a set of questions • #2: Impact analysis on “good” OO designs is performed beter and faster • measured by the time & correctness of all changes to perform a set of given change requests • controlled experiments at UKL • 2 systems (“good”, “bad”); 2x2 factorial design • results • all results significantly in favor of “good” design • students made important self-experience regarding a set of engineering principles

Agenda (for Research) (1 of 3) • SE Research results require “some form of evidence” • notations, techniques, methods & tools w/o evidence arenot accepted (e.g., PhD theses) • collaboration with SE/CS experts • Current research focus (SE techniques) • notations like Java, UML, … • architecture (specifically “product line architectures”) • processes such as inspection & testing, design requirements FROM • quality (reliability, security, safety) perspective • efficiency (cost, time) & effectiveness (“dependable SE”) • Current research focus (ESE methods) • Theory for “Evidence” Partners for EU-NoE on „SE Evidence“ (SE, ESE)? Partners for EU-STREP on „Theory for Evidence“?

Agenda (for Tech Transfer) (2 of 3) • Apply “ESE” as transfer vehicle to create sustained improvements • Use empirical studies to • evaluate major process-product relations prior to offering to industry (e.g., in vitro controlled experiments) • method prototyping: Evaluate new methods together with industry experts in order to provide ROI potential insight for decision makers (e.g., Ricoh, Bosch, German Telecom) • motivate candidate pilot project (developers & managers) with semi-controlled training experiment • evaluate pilot project (in vivo case studies) in order to adapt & motivate • continuously evaluate wide-spread use in order to motivate & optimize Without empirical evidence, no human-based process is lived! This has contributed to the grwoing gap between research & practice!

Agenda (for Teaching & Training) (3 of 3) • Learning is based on • reading • doing • experiencing • Human-based engineering activities depend heavily on the latter approaches to learning • Teaching must reflect this • first analysis, then construction • perform “self-experience” experiments • At Technical University of KL/CS department • 1st semester: NO programming (just reading & changing) • SE experiments (final UG class) • #1: Unit inspection more efficient than testing • #2: Traceable design documentation reduces effort & risk of change • #3. Informal (req) documents can be inspected efficiently (> 90%) • practical semester-long team projects with “data collection & process improvements”

Outlook • SE has the chance to become a respected engineering discipline • automotive companies have more software than hardware engineers (since 2000) • mature software engineering includes empiricism (to create evidence) • system & service engineering require mature software engineering • Techncial University of Kaiserslautern • has leading laboratory settings for empirically driven software engineering research • is ranked #6 in “Software & Systems Eng Research” world-wide by JSS (12/03) • maintains international network (USA, Asia, Europe) • looks for international partners in SE & ESE (provided they accept experimental framework!) THANK YOU for your attention! QUESTIONS?

Software Engineering Evidence

Software Engineering Evidence

Presentation Transcript

Software Engineering

Software Engineering

Software Engineering

Software Engineering

Software Engineering

Software Engineering

Software Engineering

Software Engineering

Software Engineering

software Engineering

Software Engineering

Software Engineering

Software Engineering

Software Engineering

Evidence-Based Software Engineering

Evidence-Based Software Engineering: A Paradigm for the Future?

SOFTWARE ENGINEERING

Software engineering

Evidence Based Software Engineering

Software Engineering Evidence

Software Engineering

From myths and fashions to evidence-based software engineering