Safety Critical Software 3: Evidence

Safety Critical Software 3:Evidence

Contents • Purpose and Types of Software Safety Evidence • link to system safety evidence and DSRs • Testing • concept • types of analysis • Static Analysis • concept • types of automated analysis • reviews • Software Safety Evidence

Software Safety Evidence • In general, trying to show software safe, not correct • Therefore need evidence • all safety requirements (including DSRs) are valid • all safety requirements (including DSRs) have been met • low level DSRs are traceable to higher level (vice versa) • no other functionality compromises design / evidence • Approaches • testing (dynamic analysis) • assessing behaviour of program whilst running • static analysis • investigation of properties of programs without running them • reviews (a form of static analysis) • human assessment of programs and specifications

Verification and Validation • Technical usage rather different to dictionary definitions • Verification • showing that a system meets its requirements or specifications • informally “building the system right (correctly)” • Validation • showing that a system (or specification) is what was intended • informally “building the right system” • Need both verification and validation • to show that a system is fit for purpose • For safety properties validation mainly • covered all credible hazardous situations • low level requirements sufficient to control the hazards • nothing else compromises low level requirements

V&V in the Process • Validation • needs to be undertaken for requirements, and • at every stage where derived requirements (DRs) are produced • including checking derived safety requirements (DSRs) • Verification • needs to be undertaken between each level of specification • verification down to the implementation • unless can show, say, code generators trustworthy • Many techniques have value for both verification and validation • e.g. testing may show that code does not meet its specification • analysis may show that the specification is in error • won’t over-emphasise the distinction

Problems with V&V • V&V is expensive • perhaps half the development costs for safety critical software • from production of software requirements to end of unit test • V&V is imperfect • flaws (“bugs”) often end up in the finished product • most safety critical code has about 1 flaw per kLoC • Space Shuttle code has about 0.1 flaws per kLoC • V&V often finds problems late • so it is expensive to make corrections • circa 10-100 times the cost of fixing them during development So do it early, and keep doing it! • Motivation in V&V research • find a greater proportion of flaws earlier, and more cheaply

Testing • Testing involves • executing a program and • stimulating it with expected inputs • checking the outputs to see if the conform to expectations (validation) • or specifications (verification) • Testing carried out at different “levels” • unit testing • lowest level program element, e.g. a module or package • results checked against component specification • integration testing • combination of program elements, between unit and system • results checked against intermediate level specification • system testing • testing complete integrated system • results checked against software/system requirements

Low Level Test Criteria • Usually test against some criterion • objective measure of “goodness”, or “completeness” of tests • proportion of statements • approximately 60% for Windows 2000 ! • proportion of paths If A > 10 Then statement 1 Else If B > 20 Then statement 2 End If End If three paths but no statements on the red path • also data based criteria, but less widely used A  10 B  20

Input space B Input space A 89cms 90cms 91cms 95cms Incorrect boundary Correct boundary Criteria and Strategies • Specification level test criteria • number of distinct partitions • where a partition is a part of the input space which should give rise to different behaviour from other parts • e.g. input space A and input space B below • strategies determine how to generate test data, e.g. • random • boundary value (looking for errors in defining the partition)

Automating Test Generation • Automation is achieved by using re-usable proof tactics • but the proof can be hidden from the engineer, • or recorded to verify the integrity of the test generator • Automation gives the opportunity to generate many test sets to evaluate the effectiveness of different criteria • Drivers for test automation • Cost • Automation can reduce the cost of testing • Time • particularly for regression testing (as system is modified) • Quality • improve consistency of testing practices • allow more/wider range of tests to be run in the same budget • giving a higher probability of error detection

Purpose of Static Analysis • Static analysis evaluates properties of the program without execution • inevitably “white box” • Aim is to determine properties of program • structural and quality • non-functional • functional • More subtly, aim is to identify • flaws in the program • areas which will be difficult to test • and which might therefore conceal flaws • properties that cannot be evaluated by testing • e.g. functional correctness (correspondence to specification)

Code Quality • Evaluate structural and quality criteria, e.g. • data defined before use • all code reachable from program start • program does not raise exceptions may have a bearing on safety, but links indirect • Check for conformance to programming standards • e.g. language subsets • Check for testability (cost of testing) • e.g. McCabe’s cyclomatic complexity • graph-theoretic measure of control flow complexity of the program • many test tools use static analysis in code coverage analysis LDRA Testbed: http://www.ldra.co.uk/ McCabe Test:http://www.mccabe.com/

Non-Functional Properties • Non-functional properties of interest • worst (best) case execution time (WCET or BCET) • maximum memory usage • In principle WCET can be derived by static analysis, using information in processor manual • analyse object code to find all the paths • add up time of all the instructions on each path • the biggest number is the longest path • this is feasible for “simple” processors, e.g. M68020 • progressively more difficult for modern processors • cache and pipeline analysis difficult • Series of analysis tools developed at York • current work linking analysis and testing (Rapita)

Functional Properties • There are a range of functional properties • data flow, e.g. outputs derived from inputs • information flow, e.g. data flow, plus data used in conditions which affect calculations • semantic analysis • extracting program “function” from code, i.e. given a set of input values, what output values are derived? • correctness • program behaves as specified • Safety properties • analysis against safety specification • so does the software meet what is in the System Safety Specification?

Example Tool – SPARK Examiner • The SPARK Examiner provides various analyses • quality, e.g. data and information flow • absence of run-time exceptions, e.g. no “divide by zero” • proof conditions generated from a program • no need for a specification • process largely automatic • conformance to a specification • specification defined using annotations (“formal comments”) • put in manually by software developer • verification conditions are generated from the code and annotations (“formal comments”) • these, if proven, show that the program corresponds to its specifications • uses automated static analysis and proof techniques Spark Examiner: http://www.praxis-his.com/

Reviews, Inspections and Walkthroughs • Manual processes of checking consistency and validity of specifications, programs, etc. • considerable evidence that reviews are effective • also they seem to be cost-effective i.e. find relatively large numbers of problems relatively cheaply • Reviews • Step-by-step reading of the product, with each step checked against a pre-determined list of criteria • e.g. review conformance to a set of (coding) standards • Inspections • Form of review, based on ideas of statistical process / quality control • Walkthroughs • through manual simulation of the system based on test data

Getting Value from Reviews • Item being reviewed needs to be considered complete by the presenter prior to review • Recognise limitations and compensate • Participants tend to scrutinise what is there, not what ought to be there • High-level errors, e.g. wrong concept, are difficult to spot • Item being reviewed should be small enough to be reviewed in 2 hours otherwise concentration is difficult • Useful to keep metrics • Errors found • Types/categories of error (examples later) • Where errors were introduced • This enables process problems to be identified and rectified

Software Safety Evidence • An alternative approach is to consider providing • direct evidence that safety requirements have been met • e.g. omission of “detect train entering block” will not occur more than one every million events • this involves • identifying every possible cause of the failure and either • showing that it can’t arise (by formal methods/static analysis) • showing that it will occur sufficiently improbably to meet overall requirements (summed over all causes) • different types of evidence are relevant for different failure modes • timing analysis relates to timing failures (early, late) • formal verification relates to value domain failures • producing an effective strategy involves combining different types of analysis and evidence

Types of Evidence

Combining Evidence • Two primary criteria for combining evidence • to address different causes of failures • e.g. static timing analysis (over-runs) and schedulability analysis • to try to compensate for limitations in the techniques • e.g. static timing analysis for WECT (upper bound, pessimistic) and timing testing (lower bound, optimistic) • can also think of this as giving diversity, and reducing effects of errors in techniques and tools • In practice, will require a large number of techniques • to be useful, need to be able to isolate small parts of code to focus techniques • e.g. use information flow analysis to show modules independent • then do, say, black box testing with fault injection

Software Reliability • An alternative approach • can’t we just measure failure rates, in undesired failure modes? • e.g. Softrel in use by Boeing (http://www.softrel.com/) • yes, but • realistic limit if statistical testing is 10-3 or 10-4 per hour or demand • assumes realistic input data, no time-dependent failures … • in practice, very hard to conduct meaningful tests • very few examples of this in practice • Sizewell B protection system close to this, but didn’t do full statistical analysis • reliability growth models • consider how reliability improves with (effective) fault correction • help to make predictions, but • often make unrealistic assumptions, e.g. programs stateless • unlikely to be sufficient to claim safety, on its own

Conclusions • Various ways of obtaining safety evidence • many, e.g. testing, have value for verification and validation • Major classes of technique • reviews - static, human analysis • static analysis - automated code analysis (various types) • dynamic analysis - testing • Humans are very effective • we should never underplay review • Static analysis is effective (see attached C130J paper) • even on well-tested code • is it cost-effective? • Beginning of a trend towards evidence requirements

Safety Critical Software 3: Evidence