130 likes | 153 Views
Failure Analysis Requirements Maintenance. Anticipating Failure. We cannot engineer away all possible failures System only has partial control over its environment System is made up of components which may themselves fail (especially Web services!)
E N D
Failure Analysis Requirements Maintenance
Anticipating Failure • We cannot engineer away all possible failures • System only has partial control over its environment • System is made up of components which may themselves fail (especially Web services!) • Unavoidable risks must be anticipated and planned for • Estimate both likelihood and severity (= expected cost) • Choose to ignore or plan for
Assessing Risks • Traditional approach • Failure is a system state • Logically analyse events leading to failure state • Assume that failure is catastrophic; remains until repair action is taken • Special characteristics of software failure • Not necessarily a bad state • May be incorrect sequence of events instead • Non-catastrophic • Example: user inserts coins into vending machine, gets item, but no change given
Kinds of Risk • Failure may be due to interaction between human and machine • Example: Therac-25 • Software error: backspace not registered correctly • User error: typo • Usability error: value not presented for confirmation • End result: too much radiation!
Fault Tree Analysis • Another kind of AND/OR tree • Single root: the failure state, and severity estimate • Leaf events labelled with probabilities • Probabilities propagated upward • Be careful of independence! • Can be structured hierarchically • Make a leaf the root of its own tree
Caveat Ratiocinator • .. let the analyst beware! • Probabilities for component failure are easy to determine (given, say, MTBF) • Probabilities for operator error must be based on statistics, and may depend on many factors • Work environment, experience, ... • Meaningless probabilities give meaningless analysis
Why-Because Analysis • Ignore probability, focus on causal relationships only • Flow of events in time is more explicit • “Why is some goal not fulfilled?” • Meant to deal with open, complex, heterogeneous systems • Open: significant effect by environment • Heterogeneous: different types of components (digital, analog, human, business logic)
Example: Lufthansa A320 Accident, Warsaw, 1993 • Craft landed at Warsaw Airport during a thunderstorm • None of the braking systems worked for 9 seconds • Aircraft ran off the end of runway • Collided with an earth bank, caught fire: 2 deaths • Initial report cited only failure of braking systems as a cause • But presence of the earth bank was an original cause! • That is: no other factor contributed to its presence
WB-Graphs • Shows cause-effect relations of all states and events contributing to a failure • Two steps: • List all events and states of significance (empirical) • Determine causal relations • A causes B if, in the nearest 'possible world' where A did not happen, B did not happen • For instance: 'my office door is closed, because I closed it'. • It is imaginable that, had I not closed it, some other cause (wind, another person) would have; but that is not in the nearest possible alternate reality
Finding All the Facts • The Method of Difference • Let F be a significant fact • How would behaviour have been different if F were not the case? Call this (contrafactual) behaviour B. • What is the first place where B differs from the actual behaviour? • This event or state contains a causal factor of F • Try to identify it, and label it with G; and continue
Analyzing Human Actions • Sequence of stages in human decision-making is abbreviated PARDIA • Perception • Attention • Reasoning • Decision • Intention • Action • Human error may occur at any of these stages • System design flaws can be identified as contributing factors to human error
Three Responses to Requirements Change • Add new requirements during development • But avoid 'feature creep' • Modify requirements during development • Prototypes help discover necessity • Remove requirements during development • As feasibility or business importance drops
Elements of Change Management • Configuration Items • Each configuration item is a distinct product during development • Has its own requirements and version control • Baselines • Stable version of a document for sharing • Change Management Process • All proposed changes are submitted as change requests • A review board reviews them periodically, considers interactions • If agreed, becomes part of next baseline