260 likes | 294 Views
Software Reliability Research. Pankaj Jalote Professor, CSE, IIT Kanpur, India. System Reliability. System – an entity that provides defined behavior at interfaces System is a hierarchy of subsystems, each subsystem being a system
E N D
Software Reliability Research Pankaj Jalote Professor, CSE, IIT Kanpur, India
System Reliability • System – an entity that provides defined behavior at interfaces • System is a hierarchy of subsystems, each subsystem being a system • Reliability of a system - its ability to provide failure-free operation • Failure – the system behavior is incorrect or not as expected; is a random phenomenon
Reliability Quantification • Reliability of a system defined as failure probability in a time period R(t) = Prob that system has not failed by time t • For rel work, often distribution of R(t) is specified
Reliability Quantification.. • Reliability can also be quantified by Mean Time to Failure (MTTF) • Also by failure rate (no of failures per unit time.) • From R(t), MTTF or failure rate can be determined • Under some assumptions, failure rate and MTTF are inversely related
Software Reliability • Software (un)reliability not caused due to aging but due to bugs • The more the bugs, the lesser the reliability of the software • Still failures seem random, hence rel theory can be applied
Software Reliability Research • Two main threads • Software reliability modeling – how to model and predict sw rel • Improving sw reliability – by removing defects through program checking, verification, testing,… • Will discuss some work being done here in these two
Software Reliability • Software systems often are one-off • Measuring reliability in lab not practical as too much failure data is needed; requires time • Failures often result in fault removal, leading to reliability improvement • Predicting future reliability from measured reliability is harder • Hence different models needed
Software Reliability Growth Models • Assume that reliability is a function of the defect level and as defects are removed, reliability improves • Model the failure-fix process of software evolution • Many models have been proposed in the last 3 decades • Model parameters determined from past data on failures and fixes
Reliability of Software Products • For software products, a large population exists in field and faults are not removed as failures occur • According to SRGMs, the reliability should remain the same • I.e. the failure rate should be constant
Reasons for this Phenomenon • Users learn with time and avoid failure causing situation • Users start with exploring more, then limit to some part of the product • Most users use a few product features • Configuration related failures are much more in the start • These failures reduce with time
A New Model for Product Rel. • For a user, there is a transient failure rate, which decays with a factor • With time the transient goes, and failure rate reaches a steady state • Steady state failure rate – represents the reliability of the product
Failure Rate of a Unit • Failure rate for one unit isλ (i) = λ0 *αi + λf • λ0 is the initial transient rate • λf is the final steady state rate • α is the decay factor
Applying it to a Product • Considered the failure and sale data of a real product for MS • Applying the model to the data and determining parameters, we get λ0 = 0.04 failures/month λf = 0.008 failures/month α = 0.4 (i.e. 40% decay each month)
Example… • Steady state failure rate is 1/6th of average rate in month 2, 1/3rd of average rate in month 4 • I.e. initial MTTF could be 1/6th the steady state MTTF • Steady state is reached quite soon – in two to three months
Sw Architecture • Architecture is the components in the system and how they are connected • Is decided very early in sw project • If reliability and performance can be modeled from architecture, can improve the architecture • Some work going on in arch. based perf. and rel modeling
Program Verification • Basic goal – to ensure that program is free of defects (bugs) as much as possible • Good program verification leads to higher reliability
Program Verification Techniques • Testing – program is executed with test data to find bugs • Static analysis – program source code is analyzed • Dynamic analysis – program run on some data and assertions made • Model checking • Formal verification
Techniques • Most techniques work in isolation • Sometimes they are complimentary in their defect detection capability • Combining techniques meaningfully can improve reliability • We are working on techniques for combining testing and static analysis
Testing • Testing remains main verification activity – most reliance on it • Consumes as much as half of the total effort in a sw product • Testing: test case design, execution, checking the results, then debugging, fixing, retesting • Each step is expensive
Test Automation • Test automation can help reduce cost and make testing more effective • Most test automation approaches focus on data collection, re-testing • Little effort in complete end-to-end automation • We are working on automating OO testing using state based models
Summary • Software reliability is a rich and wide area • Exciting work going on across the world in modeling, analysis, program checking, testing, etc • Lots of open issues