430 likes | 602 Views
Response to Undesired Events In Software Systems Presented by Joe Piccioni Kim Ushe Mupfumira Senthil Ramanathan Smitha Chunduri. Overview. Definition of Undesired Events (UEs) How to handle UEs Effects of UEs on code complexity Impossible Abstractions
E N D
Response to Undesired Events In Software Systems Presented by Joe Piccioni Kim Ushe Mupfumira Senthil Ramanathan Smitha Chunduri
Overview • Definition of Undesired Events (UEs) • How to handle UEs • Effects of UEs on code complexity • Impossible Abstractions • Direction of Propagation of UEs • Common Error Indications • Suggestions for a proper UE handling mechanism • Degrees of UE • Factors which determine the degree of UE • Conclusions
What are Undesired Events? • Deviation from normal behavior • Errors should not be handled but corrected • Even with programs proven to be correct UEs at run-time will continue to be a problem • Routines to respond to UEs must be provided in reliable systems
Why should we expect UEs? • Programs written to demonstrate structural programming are written with the assumption that they will always perform correctly • Incorrect data or inconsistent data may be supplied to the system • Programs are changed from time to time, new errors may appear
What should we do about UEs? • Programs can be defined to take corrective action when UEs occur • Often such programs can only be added after a period of use • Structure of the system should allow for such a likely change or addition to the program to enhance overall system reliability
A program’s response to UEs • Attempts self diagnosis • Print diagnosis information • Save partial results • Retry • Use of alternative resources • Send a message to the user
Leveled structure • An UE will be detected by a lower level • Information available elsewhere (usually at higher levels) determines the appropriate action • The UE should be communicated to higher levels where diagnosis and recovery is attempted
UE on code complexity • Probability of UEs in I/O modules is higher • Straight forward machine language to write on a tape is usually simple • Code needed for error detection and correction makes the program quite complex • As a result change in the normal case is difficult
Solution • Parnas proposes the use of a software analog of a trap used in hardware systems • Traps simplify code and decrease probability of UEs going undetected • The code concerned with recovery from UE is called by means of a trap • This organization achieves a lexical separation of normal use, detection, and correction procedures, thereby easing changes
Separating Error handling code from “Regular” code • In traditional programming, error detection, reporting, and handling often lead to confusing spaghetti code. • For example pseudo code for a function that reads an entire file into memory might look like this: read file { open the file; determine its size; allocate that much memory; read the file into memory; close the file; }
This function looks simple enough but it ignores all of the following errors: • What happens if the file can’t be opened? • What happens if the length of the file can’t be determined? • What happens if enough memory can’t be allocated? • What happens if the read fails? • What happens if the file can’t be closed?
To answer these questions within your read_function your code would end up looking like this: • error codeType readFile { initialize errorCode = 0; open the file; if (theFileIsOpen) { determine the length of the file; if (gotTheFileLength) { allocate that much memory; if (gotEnoughMemory) { read the file into memory; if (readFailed) errorCode = -1; } else errorCode = -2; } else errorCode = -3; close this file; if (theFileDidntclose && errorCode == 0) errorCode = -4; else errorCode = errorCode and -4; } else errorCode = -5; return errorCode; }
With error detection built in your original 7 lines in red have been inflated to 17 lines of code • Worse there is so much error detection, reporting, and returning that the original 7 lines of code are lost in the clutter • Java provides an easy solution to the problem of error management • Exceptions enable you to write the main flow of your code and deal with the well exceptional cases elsewhere
If the read_file function used exceptions instead of traditional error management techniques, it would look like this: readFile { try { open the file; determine its size; allocate that much memory; read the file into memory; close the file; } catch (fileOpenFailed) doSomething; catch (sizeDeterminationFailed) doSomething; catch (memoryAllocationFailed) doSomething; catch (readFailed) doSomething; catch (fileClosedFailed) doSomething; }
Note that exceptions do not spare you the effort of doing the work of detecting, reporting, and handling errors • What the exceptions do is to separate all the details of what to do when an UE happens from the normal case • Also the code size and structure is reduced and simplified
Impossible Abstractions • The need to make an appropriate response often severely limits the Abstractions we set up. • Programs become less clear when the user can’t write all of their code in terms of the abstract model. • For practicality reasons, one must compromise the abstraction and include a set of degraded designs. • Parnas’ 2nd suggestion is to not specify a module to have properties which UEs frequently violate. • Interfaces must include the necessary operations to communicate the occurrence of an UE.
The Direction of propagation of Undesired Events • Downward – violates the specified restrictions on the virtual machine. Represents an “Error of Usage”. • Upward – failure of a properly used mechanism or reflection of an Undesired Event which was previously sent downward. Represents an “Error of Mechanism”. • Job abortion occurs as a last resort. • A program should: • Recover or, • Adjust it’s external state and report the UE upwards.
Continuation After UE “Handling”The Meta-structure previously described has four advantages: • Doesn’t violate the principles of information hiding. • The Uses definition remains valid. • Allow evolution in a direction of increased reliability. • Trivial trap routines generally simplify debugging as the system is integrated. • These routines may only print their own name, but they can also indicate which module is at fault. • This information in turn will designate who should study the problem.
Modular Design Head Coach He or she acts as an interface between each Module. Offense O-Line Backs Recievers Defense D-Line Linebackers Corners, Safeties Special Teams Punters Kickers Rest
Systematic Approach = Game Plan • Basic Operations • Offense: • Running, Throwing, Catching, Blocking • Defense: • Tackling, Batting, Cover, Pursuit • Types of UEs • Injury, Performance, Penalties, Equipment, Time Management, Drastic Game Situations, etc.
Responsibility LevelsEvent Types and Handlers • Player Level (lower): • Minor injury = Tough it out • Poor performance = Try harder • Few penalties = Play smarter • Staff Level (middle): • These errors may be detected at player level but staff is responsible for taking the corrective action. • Severe injury = Substitute player • Continued poor performance = Switch formation • Equipment = Replacement
Head Coach Level • Problems which Staff cannot solve • Has information from each module and game situation information • Time Management • Run out the clock • Stop the clock • Call time out • Spike the football • Run out of bounds • Drastic Game Situations • On sides Kick Attempt • Special Teams coach can rely the costs of such an attempt.
Points applied to the Example • Impossible Abstractions • Quarterbacks can run (injury and complexity risk) • Trick plays • Interfaces contain operations to communicate UEs • Head Coach can call plays through a microphone directly to the quarterback. • Staff has complete field snapshots where they can detect important information and bring it to the head coach. • Information Hiding Principles still Valid • Coaches concerned with only the team they manage. • Players concentrate on their own position.
Common error indication • A list of general conditions where an UE could occur. • Aids in constructing a list which specifies the limitations of the program and the list of UEs which are bound to occur in case of a violation. • Aimed at improving ones anticipation of the types of UEs • It is not the comprehensive list of UEs
Common error indications(contd) • Limitations on the values of parameters Example: 1.Entering the value of speed in a stationary bike. 2.Entering you address in a web form. • Capacity Limitations Example: 1. When maximum weight an elevator can carry is exceeded. 2. Uploading attachments to your e-mail.
Common error indications(contd) • Requests for undefined information Example: Trying to open a file which doesn’t exist. • Restrictions on the order of operations Examples: 1.A Banking Module which provides functionalities such as Inserting, Deleting and Displaying a customer account. 2. Trying to access a file before opening the file.
Common error indications(contd) • Detection of actions which are likely to be unintentioned Examples: 1. A door in a car is not locked properly 2. The door of an elevator is not locked properly. (here I mean the elevators where the doors are manual) 3. Trying to open a file which is already open.
Suggestions on building a proper UE handling mechanism • Sufficiency • Priority of traps A single erroneous call may violate several of the applicability conditions. Trapping to several UE routines not efficient. Traps should be prioritized. Example: Entering a credit card number when making a purchase on web.
Suggestions……… • Size of the “trap vector” • Influence of the state of a function on occurrence of a trap Example: A doctor who diagnoses a patient before providing any treatment .
Suggestions….. • Providing Accurate Information about the UE to the user It isdifficult because design methods hidden from the user provides the accurate information about the UE • Two extreme approaches 1. Use of single trap to report failure. Disadvantages: It is very hard for the user to diagnose the failure
Suggestions…… 2. Fully detailed where a predicate is associated with each function. Predicate is set true if the associated function is affected by the failure. A master predicate which is set to true in case of a catastrophic failure Disadvantages: Would return true or false for each function call. Highly redundant.
Suggestions….. • An optimized approach Failure trap routines pass a parameter which classifies the type of error. Example: errno and strerror(errno) in C language. • Redundancy and efficiency The fully detailed extreme provides a highly insulated module.
Suggestions….. • Redundancy of checks has to be eliminated when UEs are rare. • Retaining the upper level checks Can detect UEs before any irreversible change • Retaining the lower level checks Usually Preferred except when it is not difficult to back up
Incidents Vs Crashes • Incidents are events although undesired were expected and recovery attempts were successful. • All other errors are CRASHES!!!!! • This distinction is required to allow several degrees of undesired events. • Recovery is considered to be successful if each degree satisfies a set of predicates. • If requirements of degree “i” can’t be met system attempts to satisfy degree ‘i+1’.
Example An error playing a CD: • Check if the case is properly closed. • If the power cord is properly fixed. • Any internal problem which can be repaired. • Any internal part which needs replacement. • Serious damage which can’t be repaired or replaced.
Degrees of UE Allows a programmer to: • Define what he expects his program to do. • What he wants to treat as an incident and how he is prepared to handle it. • What he means by correct UE handling.
Factors which determine the degree of an UE • Basic Cause Find the cause by trying recovery actions. Start with the simplest or cheapest and when it fails try the next one. • Situation The degree of an undesiredevent depends on the situation at the time the UE occurred.The degree varies depending on the situation when the UE had occurred.
Order of Degrees Criteria for determining the ordering of degrees can be considered by • Order of Aims • Order of Actions
Order of Aims • Situation achieved by degree ‘i’ is less desirable than aims of lower degrees. • “Less desirable “ depends on the goal and purpose of the user. They might be different for different users.
Order of Actions Order of degrees may be different even if all degrees may lead to same situation using different methods and costs. Decision as to which degree should be tried must be left to the user. Recovery from an UE requires cooperation of both levels.
Solutions • Provide different versions of the system (difference lies in their preparation for and recovery from UE’s.) • Provide recovery actions as operations of the abstract machine.
Dependable, Feel-Good Software • Systematic approach throughout the system. • Abstract interfaces not excessively restrictive. • Pass failures upward, reflect downward traveling UEs. • UE consideration requires half (or more) of the programmers effort. • The TRAP function should be a separate module containing the details of inter-level communication. • This communication is hidden from each level. • Information about UEs are defined in the level’s abstract terms. • The Uses hierarchy is maintained. • Costs are low as long as no UE occurs.