220 likes | 407 Views
Safety Critical Software 3: Evidence. Contents. Purpose and Types of Software Safety Evidence link to system safety evidence and DSRs Testing concept types of analysis Static Analysis concept types of automated analysis reviews Software Safety Evidence. Software Safety Evidence.
E N D
Contents • Purpose and Types of Software Safety Evidence • link to system safety evidence and DSRs • Testing • concept • types of analysis • Static Analysis • concept • types of automated analysis • reviews • Software Safety Evidence
Software Safety Evidence • In general, trying to show software safe, not correct • Therefore need evidence • all safety requirements (including DSRs) are valid • all safety requirements (including DSRs) have been met • low level DSRs are traceable to higher level (vice versa) • no other functionality compromises design / evidence • Approaches • testing (dynamic analysis) • assessing behaviour of program whilst running • static analysis • investigation of properties of programs without running them • reviews (a form of static analysis) • human assessment of programs and specifications
Verification and Validation • Technical usage rather different to dictionary definitions • Verification • showing that a system meets its requirements or specifications • informally “building the system right (correctly)” • Validation • showing that a system (or specification) is what was intended • informally “building the right system” • Need both verification and validation • to show that a system is fit for purpose • For safety properties validation mainly • covered all credible hazardous situations • low level requirements sufficient to control the hazards • nothing else compromises low level requirements
V&V in the Process • Validation • needs to be undertaken for requirements, and • at every stage where derived requirements (DRs) are produced • including checking derived safety requirements (DSRs) • Verification • needs to be undertaken between each level of specification • verification down to the implementation • unless can show, say, code generators trustworthy • Many techniques have value for both verification and validation • e.g. testing may show that code does not meet its specification • analysis may show that the specification is in error • won’t over-emphasise the distinction
Problems with V&V • V&V is expensive • perhaps half the development costs for safety critical software • from production of software requirements to end of unit test • V&V is imperfect • flaws (“bugs”) often end up in the finished product • most safety critical code has about 1 flaw per kLoC • Space Shuttle code has about 0.1 flaws per kLoC • V&V often finds problems late • so it is expensive to make corrections • circa 10-100 times the cost of fixing them during development So do it early, and keep doing it! • Motivation in V&V research • find a greater proportion of flaws earlier, and more cheaply
Testing • Testing involves • executing a program and • stimulating it with expected inputs • checking the outputs to see if the conform to expectations (validation) • or specifications (verification) • Testing carried out at different “levels” • unit testing • lowest level program element, e.g. a module or package • results checked against component specification • integration testing • combination of program elements, between unit and system • results checked against intermediate level specification • system testing • testing complete integrated system • results checked against software/system requirements
Low Level Test Criteria • Usually test against some criterion • objective measure of “goodness”, or “completeness” of tests • proportion of statements • approximately 60% for Windows 2000 ! • proportion of paths If A > 10 Then statement 1 Else If B > 20 Then statement 2 End If End If three paths but no statements on the red path • also data based criteria, but less widely used A 10 B 20
Input space B Input space A 89cms 90cms 91cms 95cms Incorrect boundary Correct boundary Criteria and Strategies • Specification level test criteria • number of distinct partitions • where a partition is a part of the input space which should give rise to different behaviour from other parts • e.g. input space A and input space B below • strategies determine how to generate test data, e.g. • random • boundary value (looking for errors in defining the partition)
Automating Test Generation • Automation is achieved by using re-usable proof tactics • but the proof can be hidden from the engineer, • or recorded to verify the integrity of the test generator • Automation gives the opportunity to generate many test sets to evaluate the effectiveness of different criteria • Drivers for test automation • Cost • Automation can reduce the cost of testing • Time • particularly for regression testing (as system is modified) • Quality • improve consistency of testing practices • allow more/wider range of tests to be run in the same budget • giving a higher probability of error detection
Purpose of Static Analysis • Static analysis evaluates properties of the program without execution • inevitably “white box” • Aim is to determine properties of program • structural and quality • non-functional • functional • More subtly, aim is to identify • flaws in the program • areas which will be difficult to test • and which might therefore conceal flaws • properties that cannot be evaluated by testing • e.g. functional correctness (correspondence to specification)
Code Quality • Evaluate structural and quality criteria, e.g. • data defined before use • all code reachable from program start • program does not raise exceptions may have a bearing on safety, but links indirect • Check for conformance to programming standards • e.g. language subsets • Check for testability (cost of testing) • e.g. McCabe’s cyclomatic complexity • graph-theoretic measure of control flow complexity of the program • many test tools use static analysis in code coverage analysis LDRA Testbed: http://www.ldra.co.uk/ McCabe Test:http://www.mccabe.com/
Non-Functional Properties • Non-functional properties of interest • worst (best) case execution time (WCET or BCET) • maximum memory usage • In principle WCET can be derived by static analysis, using information in processor manual • analyse object code to find all the paths • add up time of all the instructions on each path • the biggest number is the longest path • this is feasible for “simple” processors, e.g. M68020 • progressively more difficult for modern processors • cache and pipeline analysis difficult • Series of analysis tools developed at York • current work linking analysis and testing (Rapita)
Functional Properties • There are a range of functional properties • data flow, e.g. outputs derived from inputs • information flow, e.g. data flow, plus data used in conditions which affect calculations • semantic analysis • extracting program “function” from code, i.e. given a set of input values, what output values are derived? • correctness • program behaves as specified • Safety properties • analysis against safety specification • so does the software meet what is in the System Safety Specification?
Example Tool – SPARK Examiner • The SPARK Examiner provides various analyses • quality, e.g. data and information flow • absence of run-time exceptions, e.g. no “divide by zero” • proof conditions generated from a program • no need for a specification • process largely automatic • conformance to a specification • specification defined using annotations (“formal comments”) • put in manually by software developer • verification conditions are generated from the code and annotations (“formal comments”) • these, if proven, show that the program corresponds to its specifications • uses automated static analysis and proof techniques Spark Examiner: http://www.praxis-his.com/
Reviews, Inspections and Walkthroughs • Manual processes of checking consistency and validity of specifications, programs, etc. • considerable evidence that reviews are effective • also they seem to be cost-effective i.e. find relatively large numbers of problems relatively cheaply • Reviews • Step-by-step reading of the product, with each step checked against a pre-determined list of criteria • e.g. review conformance to a set of (coding) standards • Inspections • Form of review, based on ideas of statistical process / quality control • Walkthroughs • through manual simulation of the system based on test data
Getting Value from Reviews • Item being reviewed needs to be considered complete by the presenter prior to review • Recognise limitations and compensate • Participants tend to scrutinise what is there, not what ought to be there • High-level errors, e.g. wrong concept, are difficult to spot • Item being reviewed should be small enough to be reviewed in 2 hours otherwise concentration is difficult • Useful to keep metrics • Errors found • Types/categories of error (examples later) • Where errors were introduced • This enables process problems to be identified and rectified
Software Safety Evidence • An alternative approach is to consider providing • direct evidence that safety requirements have been met • e.g. omission of “detect train entering block” will not occur more than one every million events • this involves • identifying every possible cause of the failure and either • showing that it can’t arise (by formal methods/static analysis) • showing that it will occur sufficiently improbably to meet overall requirements (summed over all causes) • different types of evidence are relevant for different failure modes • timing analysis relates to timing failures (early, late) • formal verification relates to value domain failures • producing an effective strategy involves combining different types of analysis and evidence
Combining Evidence • Two primary criteria for combining evidence • to address different causes of failures • e.g. static timing analysis (over-runs) and schedulability analysis • to try to compensate for limitations in the techniques • e.g. static timing analysis for WECT (upper bound, pessimistic) and timing testing (lower bound, optimistic) • can also think of this as giving diversity, and reducing effects of errors in techniques and tools • In practice, will require a large number of techniques • to be useful, need to be able to isolate small parts of code to focus techniques • e.g. use information flow analysis to show modules independent • then do, say, black box testing with fault injection
Software Reliability • An alternative approach • can’t we just measure failure rates, in undesired failure modes? • e.g. Softrel in use by Boeing (http://www.softrel.com/) • yes, but • realistic limit if statistical testing is 10-3 or 10-4 per hour or demand • assumes realistic input data, no time-dependent failures … • in practice, very hard to conduct meaningful tests • very few examples of this in practice • Sizewell B protection system close to this, but didn’t do full statistical analysis • reliability growth models • consider how reliability improves with (effective) fault correction • help to make predictions, but • often make unrealistic assumptions, e.g. programs stateless • unlikely to be sufficient to claim safety, on its own
Conclusions • Various ways of obtaining safety evidence • many, e.g. testing, have value for verification and validation • Major classes of technique • reviews - static, human analysis • static analysis - automated code analysis (various types) • dynamic analysis - testing • Humans are very effective • we should never underplay review • Static analysis is effective (see attached C130J paper) • even on well-tested code • is it cost-effective? • Beginning of a trend towards evidence requirements