630 likes | 760 Views
Dependable Software Development. Lecture 7. System dependability. For many computer-based systems, the most important system property is the dependability of the system. The dependability of a system reflects: T he user’s degree of trust in that system.
E N D
Dependable Software Development Lecture 7
System dependability • For many computer-based systems, the most important system property is the dependability of the system. • The dependability of a system reflects: • The user’s degree of trust in that system. • The extent of the user’s confidence that it will operate as users expect • That it will not ‘fail’ in normal use. • Dependability covers the related systems attributes of reliability, availability and security. These are all inter-dependent. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 19, MCS-NUST
Importance of dependability • System failures may have widespread effects with large numbers of people affected by the failure. • Systems that are not dependable and are unreliable, unsafe or insecure may be rejected by their users. • The costs of system failure may be very high if the failure leads to economic losses or physical damage. • Undependable systems may cause information loss with a high consequent recovery cost. • Causes of failure: • Hardware failure: Poor design and manufacturing errors • Software failure: errors in its specification, design or implementation. • Operational failure: perhaps the largest single cause of system failures in socio-technical systems Adv Software Engg, by Asst Prof AtharMohsin, MSCS 19, MCS-NUST
Principal dependability properties Adv Software Engg, by Asst Prof AtharMohsin, MSCS 18, MCS-NUST
Principal properties • Availability • The probability that the system will be up and running and able to deliver useful services to users. • Reliability • The probability that the system will correctly deliver services as expected by users. • Safety • A judgment of how likely it is that the system will cause damage to people or its environment. • Security • A judgment of how likely it is that the system can resist accidental or deliberate intrusions. Adv Software Engg, by Asst Prof AtharMohsin, MSCS 18, MCS-NUST
Other dependability properties • Repairability • Reflects the extent to which the system can be repaired in the event of a failure • Maintainability • Reflects the extent to which the system can be adapted to new requirements; • Survivability • Reflects the extent to which the system can deliver services whilst under hostile attack; • Error tolerance • Reflects the extent to which user input errors can be avoided and tolerated. Adv Software Engg, by Asst Prof AtharMohsin, MSCS 18, MCS-NUST
Software dependability • Software fault avoidance approaches include: • Formal or precise specification practices, • Programming disciplines like information hiding and • encapsulation, • Extensive and repetitive reviews and formal analyses during the development process • rigorous testing • software fault avoidance approaches include • verification & validation, software testing, and proof methodology • In general, software customers expect all software to be dependable. • However, for non-critical applications, they may be willing to accept some system failures. • Some applications, have very high dependability requirements and special software engineering techniques may be used to achieve this. • Dependability achievement • Fault avoidance • The system is developed in such a way that human error is avoided and thus system faults are minimised. • The development process is organised so that faults in the system are detected and repaired before delivery to the customer. • Fault detection • Verification and validation techniques are used to discover and remove faults in a system before it is deployed. • Fault tolerance • The system is designed so that faults in the delivered software do not result in system failure. Formal methods are fault avoidance techniques that aim to increase dependability by eliminating errors at the requirements specification and design stages of development • fault tolerance technique tries to keep the system operational despite the presence of faults. • Since complete fault avoidance or elimination is not possible, a critical system always employs fault tolerance techniques to guarantee high system reliability and Availability Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Critical Systems • Software failure is common, some time the failure can cause inconvenience but no serious damage, some times it does harm to the human life • Known as “critical system” • Three types of critical systems are: • Safety-critical systems • Failure may results in loss of life, injury or damage to the environment; • Chemical plant protection system; • Mission-critical systems • Failure results in failure of some goal-directed activity; • Spacecraft navigation system; • Business-critical systems • Failure results in high economic losses; • Customer accounting system in a bank; • For critical systems, the most important system property is the dependability of the system Adv Software Engg, by Asst Prof AtharMohsin, MSCS 18, MCS-NUST
Safety-critical systems • Safety-Critical systems: • Systems whose failure could result in loss of life, cause significant property damage or cause damage to the environment. • These systems must be designed in such a way as to guarantee system stability during all of the system operational modes. • when a fatal fault occurs, the system safely shuts down. • Applications • Computer based systems used in avionics, chemical process and nuclear power plants. • A failure in the system endangers human lives directly or through environment pollution and Influence is on a large economic scale. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Safety-Critical Systems - Present • Transportation systems from flight to automobiles • New airplanes contain advanced avionics such as inertial guidance systems and GPS receivers that also have considerable safety requirements. • Automobiles, electric vehicles. and hybrid vehicles are increasingly using embedded systems to maximize efficiency and reduce pollution. • Other automotive safety systems such as anti-lock braking system, Electronic Stability Control, and automatic four-wheel drive. • Medical equipment is continuing to advance with more embedded systems • Vital signs monitoring • Electronic stethoscopes for amplifying sounds • Various medical imaging for non-invasive internal inspections. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Can We Trust the Computer? Case Study: The Therac-25 Based on Article in IEEE-Computer, July 1993.
Opening the case • One of the most widely reported accidents involved the Therac-25 • radiation therapy machine • June 1985 and January 1987 • Six known accidents - massive overdoses • causing deaths and serious injuries • Worst accidents in 35 year history of medical accelerators • “A significant amount of SW for life-critical systems comes from small firms, especially in the medical industry; firms that fit the profile of those resistant to or uninformed of the principles of either system safety or software engineering.” Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 19, MCS-NUST
Therac-25 • Massive overdoses of radiation were given; • Medical accelerator to treat tumors • 6 known accidents resulting in death or serious injury • June 1985 – January 1987 • Caused severe and painful injuries and the death of three patients Airbag sensory system in Automobiles “--- this thing will probably have to work only once in 10 years, but it better work then, otherwise the result will be catastrophic.” Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Background of the case • Medical linear accelerators accelerate electrons to create high-energy beams that can destroy tumors with minimal impact on surrounding healthy tissue • shallow tissue is treated with accelerated electrons: • Deeper tissue requires converting the electron beam into X-ray photons • The Therac-25is a medical linear accelerator. • A linear accelerator ("linac") is a particle accelerator, a device that increases the energy of electrically charged atomic particles. • The charged particle are accelerated by the introduction of an electric field, producing beams of particles which are then focused by magnets. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 19, MCS-NUST
Case study – Therac-25 • Linacsare used to treat cancer patients. • A patient is exposed to beams of particles, or radiation, in doses designed to kill a tumor. • Since malignant tissues are more sensitive than normal tissues to radiation exposure, a treatment plan can be developed that permits the absorption of an amount of radiation that is fatal to tumor cells but causes relatively minor damage to normal tissue. • Shallow tissue is treated with electrons, but to reach deeper tissue, X-ray photons are needed Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Development of Therac-25 • Developed from the Therac-6’s • A 6MeV accelerator producing only X rays, • Evolve to Therac-20's • A 20-MeV dual mode(X Rays or electrons) accelerator • SW functionality was limited in both machines, it added convenience to existing hardware • Industry-standard hardware safety features and interlocks in the hardware were retained • Therac-25 • Therac-25, dual-mode linear accelerator • more compact and versatile than Therac-20 • Therac-25 takes advantage of computer control from outset while Therac-6 and 20 designed around machines already having histories of clinical use w/o computer control • Therac-25 has more responsibility for maintaining safety than SW in previous machines Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Therac-25's software • Oneprogrammer, over several years, revised the Therac-6 software into the Therac-25 software. • An important difference between the Therac-20 software and the Therac-25 software is the overall role that each plays in the machine. • In the Therac-20, the role of software is limited. • The software simply adds convenience to the hardware. • In the Therac-25, software exclusively performs many of the critical safety checks of the system; • these safety checks are also included in the hardware of the Therac-20, but were not included in the Therac-25 hardware. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
How it Operates • SW responsible for monitoring machine status • accepts input about treatment desired, sets machine up for treatment • turns beam on , activated by operator command • turns beam off when treatment is completed, or when operator commands it OR when a malfunction is detected • Unit has an interlock system designed to remove power to unit when there is a HW malfunction • Computer monitors interlock system and provides diagnostic messages • depending on fault the computer either prevents a treatment from starting OR if treatment is in progress, creates a pause or suspension of treatment
The Safety Analysis Report (before release of product) • Programming errors have been reduced by extensive testing on a HW simulator and under field conditions on teletherapy units. • Any residual SW errors are not included in the analysis • Program SW does not degrade due to wear, fatigue, or reproduction process • Computer execution errors are caused by faulty HW components and by “soft” (random) errors induced by alpha particles and electromagnetic noise. • The fault tree does include computer failure but only hardware failures
Therac-25 SW Testing • Manufacturer said the HW and SW were “tested and exercised separately or together over many years” • In deposition, QA manager explained, testing was done in two parts • “small amount” of SW testing done on a simulator • most done on system • Reports indicate that unit and SW testing was minimal • Most testing efforts directed to integrated system test • Same QA manager at a Therac-25 users meeting stated the SW was tested for 2,700 hours • Under questioning by users clarified this as “2700 hours of use” • Programmer left AECL in 1986, we know nothing of the programmer • AECL employees could not provide any information about the programmers educational background or experience
Therac-25 • Software was carried over from earlier projects where it had seemingly worked well • Therac-6, Therac-20 • Computer control added to earlier machines • Still capable of stand-alone (no computer) operation • All standard hardware safety mechanisms • Therac-25 • Software defects in earlier machines were hidden by hardware safeguards • No real software development process • Apparently no serious evaluation of risks involved in using software in lieu of hardware safeguards • Single programmer • Operating system was developed by one programmer using Assembly Language in the 1970’s. • SW “evolved” from Therac-6 (which was started in 1972) • Very little SW documentation produced during development When designing dependable systems we must deal with dependability issues from the beginning by addressing fault-tolerance mechanisms within the system design and by employing appropriate fault-avoidance approaches in the design process. Adding dependability later on could be both expensive and might be not so effective as designing it in from the beginning. fault avoidance, fault removal and fault tolerance represent three successive lines of defense against the contingency of faults in software systems and their impact on system reliability Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
THE Software Errors • Each bug contained in the Therac-25 software was also found in the software of the Therac-20. • However, the hardware safety interfaces in the Therac-20 prevented any accidents from occurring in the other machine. • The Therac-25 software errors that cause radiation overexposures can be reduced down to interface errors. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Fault-free software • Fault-free software means software which conforms to its specification. • It does NOT mean software which will always perform correctly as there may be specification errors. • Therac-25 • 1983 safety analysis, in effect, assumed that software had no errors! • “Programming errors have been reduced by extensive testing ... Any residual software errors are not included in the analysis.” • “Computer execution errors are caused by faulty hardware components and by ‘soft’ (random) errors induced by alpha particles and electromagnetic noise.” Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Diversity and Redundancy • Redundancy - Where availability is critical • e.g. in e-commerce systems, companies normally keep backup servers and switch to these automatically if failure occurs. • Keep more than 1 version of a critical component available so that if one fails then a backup is available. • Diversity - To provide flexibility against external attacks • Different servers may be implemented using different operating systems (e.g. Windows and Linux) • Provide the same functionality in different ways so that they will not fail in the same way. • However, adding diversity and redundancy adds complexity and this can increase the chances of error. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Rigorous Software Development • Addresses quality and productivity by emphasizing the early stages in the development process • concentrates on developing an early, precise understanding of the required behavior of the system • Think carefully about what you want to do and get it right the first time. • Underlying the rigorous approach are formal specification languages • These are mathematically based languages that provide support for abstract and precise descriptions of software systems. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Therac-25 • Overconfidence in Software • Safety analysis did not include software, even though it was responsible for safety of the system • When problems did occur, it was assumed to be a hardware failure • Software was designed for small memory footprint • Self Checks, Error Detection, Error handling and Auditing was left out • Risk Assessment did not include software Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Static and Dynamic verification • Software inspections.(static verification) • Concerned with analysis of the static system representation to discover problems • May be supplement by tool-based document and code analysis • Software testing. (dynamic verification) • Concerned with exercising and observing product behaviour • The system is executed with test data and its operational behaviour is observed Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Stages of static analysis • Control flow analysis. • Checks for loops with multiple exit or entry points, finds unreachable code, etc. • Data use analysis. • Detects uninitialized variables, variables written twice without an intervening assignment, variables which are declared but never used, etc. • Interface analysis. • Checks the consistency of routine and procedure declarations and their use Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Therac-25 • The Operator Interface • At first, operator needed to enter information at the treatment table, and then re-enter at a console in the control room • Operators complained; safeguard was removed • Error codes are reported on the screen with no English explanation • Example: (East Texas Cancer Center) “Malfunction 54” reported, caused by “dose input 2”. • An AECL technician testified that “does input 2” means the dose delivered was either too high or too low (!) • “Treatment Pause” after non-critical error, which operator can ignore by pressing “P” • Causes operators to become insensitive to errors Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Therac-25 • Example Bugs • Data Entry Bug • Setting the bending magnets takes 8 seconds • “Delay” subroutine uses shared memory with the data entry subroutine • So data changes within 8 seconds will be wiped out when Delay exits! • Causes bugs that only show up with proficient users who do data entry in <8 seconds • Set-Up Test Bug • On every 256th pass through Set-Up (one-byte counter), the upper collimator is not checked • Problem if operator hits “set” exactly when counter rolls over to 0 • These kinds of bugs are notoriously difficult to track down Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Level of Concern • For critical systems the major, minor and moderate safety concerns must be identified • Therac-25 • Major: • Device directly affects the patient or operator and failure could result in death or serious injury • Moderate: • Device directly affects the patient and failure could result in non-serious injury • Minor: • Failures will not result in injury Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Levels of Concern • Does the software • Control life support device? • Control delivery of harmful energy? • Control treatment delivery? • Provide diagnosis as basis for treatment? • Monitor vital signs? • If no to all these questions, then concern is minor Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Safety • System property that reflects the system’s ability to operate (normally or abnormally) without danger to system environment • As more devices become software controlled, safety becomes a greater concern • Safety requirements are exclusive (they exclude undesirable situations rather than specify required system services) • Safety Criticality • Primary safety-critical systems • embedded software systems whose failure can cause associated hardware to fail and directly threaten people • Secondary safety-critical systems • systems whose faults can cause other systems to fail which cause threaten people Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Safety and Reliability • They are related, but not identical • Reliability • concerned with conformance to a specification and delivery of a service • Safety • concerned with ensuring a system cannot damage, regardless of its conformance (or nonconformance) to its specification • Safety Achievements • Hazard Avoidance • system design so some hazard cases can not arise • Hazard Detection and Removal • system design so hazards are detected and removed before they result in an accident • Damage Limitation • system includes protection features that minimize damage that may result from an accident Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Case study Insulin Pump • The system measures the level of blood sugar every 10 minutes and if this level is above a certain value and is increasing then the dose of insulin to counteract the increase is computed and injected into the diabetic • The system can also detect abnormally low levels of blood sugar and, if these occur, an alarm is sounded to warn the diabetic that they should take some action. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Dependability requirements • The system shall be available to deliver insulin when required to do so. • The system shall perform reliability and deliver the correct amount of insulin to counteract the current level of blood sugar. • The essential safety requirement is that excessive doses of insulin should never be delivered as this is potentially life threatening. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Dependability attributes • Availability • The pump should have a high level of availability but the nature of diabetes is such that continuous availability is unnecessary • Reliability • Intermittent demands for service are made on the system • Safety • The key safety requirements are that the operation of the system should never result in a very low level of blood sugar. A fail-safe position is for no insulin to be delivered • Security • Not really applicable in this case Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Sample Requirement Specifications Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
General dependability requirements • SR1: • The system shall not deliver a single dose of insulin that is greater than a specified maximum dose for a system user. • SR2: • The system shall not deliver a daily cumulative dose of insulin that is greater than a specified maximum for a system user. • SR3: • The system shall include a hardware diagnostic facility that should be executed at least 4 times per hour. • SR4: • The system shall include an exception handler for all of the exceptions that are identified in Table ….. • SR5: • The audible alarm shall be sounded when any hardware anomaly is discovered and a diagnostic message as defined in Table ……. should be displayed. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Insulin Pump System Design • The important design decisions made during the production of insulin pump software and the simulator. • Approach used to produce the insulin pump software was to emulate the hardware organization by producing separate software objects (classes) for each distinguishable hardware object • Controller:: • Clock:: • Display:: • Simulator:: System ArchitectureInsulin pump components Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
The software objects • The controller object:, the bulk of the computation within the system is carried • It is within the controller that the dose of insulin to be delivered is computed and where the self tests are performed • Clock Object:, Working in together with the controller object, • Constantly determining how much time has lapsed since the software was started or the timer was reset (which happens every 24 hours). • Periodically, at every interval specified the clock triggers certain events required to be performed by the system Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
The software objects • Display:: Object, used to create a graphical user interface (GUI), • The data is then presented to the user via text boxes positioned on the GUI • The remaining software objects model the peripheral hardware units, • the software contained within these objects simply records the current state of the hardware unit and for the purpose of simulation, provides the functionality to change that state. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
The software objects • The simulator software object: • Provides the user with the functionality to perform a simulation of real-world events that would affect the pump software in differing manners • The simulator facilitates the testing process • making it quicker and easier to perform the necessary testing required in order to determine whether the insulin pump system is adequately safe. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Object Interaction – Object classes Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Object Interaction – Sequence Diagrams Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Object Interaction – Sequence Diagrams Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Insulin delivery system • Data flow model of software-controlled insulin pump Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Concept of operation • Using readings from the embedded sensor, the system automatically measures the level of glucose in the sufferer’s body • Consecutive readings are compared and, if they indicate that the level of glucose is rising then insulin is injected to counteract this rise • The ideal situation is a consistent level of sugar that is within some ‘safe’ band Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Sugar levels • Unsafe • A very low level of sugar (arbitrarily, we will call this 3 units) is dangerous and can result in hypoglaecemia which can result in a diabetic coma and ultimately death. • Safe • Between 3 units and about 7 units, the levels of sugar are ‘safe’ and are comparable to those in people without diabetes. This is the ideal band. • Undesirable • Above 7 units of insulin is undesirable but high levels are not dangerous in the short-term. Continuous high-levels however can result in long-term side-effects. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST
Injection scenarios • Level of sugar is in the unsafe band • Do not inject insulin; • Initiate warning for the sufferer. • Level of sugar is falling • Do not inject insulin if in safe band. Inject insulin if rate of change of level is decreasing. • Level of sugar is stable • Do not inject insulin if level is in the safe band; • Inject insulin if level is in the undesirable band to bring down glucose level; • Amount injected should be proportionate to the degree of undesirability ie inject more if level is 20 rather than 10. Adv Software Engg, by Asst Prof Athar Mohsin, MSCS 18, MCS-NUST