110 likes | 121 Views
Workshop discussing the importance of fault protection systems in aviation software, focusing on reliability through redundancy, fault containment mechanisms, and system monitoring models.
E N D
Software Fault Protection Allen Goldberg Kestrel Technology
System Engineering • System engineers build reliable systems • from • less reliable components. • Redundancy is a primary means of achieving reliability. • Systems are monitored for anomalies. • Fault containmentmechanisms (e.g. firewalls) limit damage Workshop on Aviation Software, Oct. 2006
What About software? Assume perfection, little accommodation for failure even though perfection is rarely achievable Can we make reliable software systems from less reliable software components? Workshop on Aviation Software, Oct. 2006
IVHM Fault Protection Systems Fault Protection System monitoring model fault response System under control Workshop on Aviation Software, Oct. 2006
Software Fault Protection System monitoring Model of software fault response SUT is software Software Fault Protection (SFP) Workshop on Aviation Software, Oct. 2006
Software Redundancy • redundancy: different representations of software behavior • code • test case • model • … • Redundancy is expensive • How should you invest your “redundancy” dollars? Workshop on Aviation Software, Oct. 2006
Effective Redundancy at Runtime • software “model” • “1.2” version programming • 1 full-featured, efficient, complex version • 0.2 backup version performs essential functions Software Fault Protection System monitoring Model of software fault response software Workshop on Aviation Software, Oct. 2006
Software Model • When software fails it is usually “obviously” wrong • Simple models can detect • errors • interface behavior • data reasonableness • resource usage • Our model extends ARINC 653 configuration file Software Fault Protection System monitoring Model of software fault response software Workshop on Aviation Software, Oct. 2006
Failure responses • safe modes: terminate non-essential activities • component reset (supported by 653) • transient errors lead to bad state • component replacement (supported by 653) • “1.2” version programming Workshop on Aviation Software, Oct. 2006
Fault Containment • Eliminate “non-logical” software dependencies • error propagation (crash) • resource contention • ARINC 653 • Fault containment is essential to fault isolation Workshop on Aviation Software, Oct. 2006
Future Work • relate SFP with multi-string flight computers, and system fault protection • relate SFP to treatment of radiation induced SEU’s • generate SFP models from software design artifacts • generate SFP implementations from SFP models Workshop on Aviation Software, Oct. 2006