100 likes | 113 Views
Explore the challenges of complexity and stability in modern avionics systems, from real dangers to software anomalies, interactive complexity, architecture implementations, coherent protocols, and implicit assumptions affecting interoperability. Learn about ensuring software stability and avoiding hazards in critical systems.
E N D
Complexity and Stability in Modern Avionics Lui Sha, lrs@cs.uiuc,edu October 3, 2006 University of Illinois at Urbana-Champaign 1
Real and Present Dangers “As a Malaysia Airlines jetliner cruised from Perth, Australia, to Kuala Lumpur, Malaysia, one evening last August, it suddenly took on a mind of its own and zoomed 3,000 feet upward. The captain disconnected the autopilot and pointed the Boeing 777's nose down to avoid stalling, but was jerked into a steep dive. …. A defective software program had provided incorrect data about the aircraft's speed and acceleration, confusing flight computers.” 5/30/2006 WSJ The FAA’s emergency airworthiness directive (AD 2005-18-51) regarding this safety incident, notes, “These anomalies could result in high pilot workload, deviation from the intended flight path, and possible loss of control of the airplane.” A study of the airworthiness directives (ADs) and service difficulty reports (SDRs) for large aircraft (1984-1994) revealed 33 avionics ADs, 13 of which were software related. • From a study on Avionics Software Occurrence Rate, http://doi.ieeecomputersociety.org/10.1109/ISSRE.1996.558695 University of Illinois at Urbana-Champaign 2
High Interactive Complexity In some systems, the interaction between distributed concurrent activities has become so complex that it is often impossible to accurately model, let alone analyze. Reduced interactive complexity design • Reference architectures and design rules to simplify interactions and to reduce non-determinism, • Provable isolation mechanisms in time and space (ARINC 653 has holes on I/O & cache) • Verifiable safe sharing of logical services and plant controls • End-to-end hardware & software co-scheduling for predictable timing behaviors, • Static analysis for design rule compliance, • V&V coverage driven instrumentation and fault injection test generation. During development, “avionics have failed or shut down during numerous tests … The shutdowns have occurred when the pilot attempts to use the radar, communication, navigation, identification and electronic warfare systems concurrently.” http://www.gao.gov/new.items/d03 603t.pdf University of Illinois at Urbana-Champaign 3
Architecture and Implementation Divergence It is easy to specify various properties in architecture documents such as using consistent units. How to Ensure source codes compliance with architecture specification is a major scientific and technological challenge. “NASA lost a $125 million Mars orbiter because one engineering team used metric units while another used English units “ http://www.cnn.com/TECH/space/9 909/30/mars.metric/ Annotation of design intent in the code plus static analysis can solve a subset of the problems. University of Illinois at Urbana-Champaign 4
Lack of Coherent Protocols Protocols for a specific attribute are developed by domain experts, who may not be experts in other domains. Pathological technology interactions caused by incompatible protocols is a serious interoperability challenge. Modeling and reasoning about cross domain property is needed. Pathological Interaction between RT scheduling and sync. protocols caused Mars Pathfinder repeated resets, until the priority inheritance protocol was activated. http://research.microsoft.com/~mbj/Mars_P athfinder/Mars_Pathfinder.html University of Illinois at Urbana-Champaign 5
Implicit Assumptions A Major Cause of Interoperability Problems Ariane 5 reused a module developed for Ariane 4, which assumed that the horizontal velocity component would not overflow a 16-bit variable. This was true for Ariane 4 but not for Ariane 5, leading to self-destruction roughly 40 seconds after the launch. There must be a verifiable proxy that characterizes the external environment. All the assumptions in each component must be made explicit and machine checkable http://www.ima.umn.edu/~arnol d/disasters/ariane5rep.html University of Illinois at Urbana-Champaign 6
Co-Stability of Software Controlled Physical System Large and complex embedded software has already emerged as a new source of safety hazards in practice. Large and complex RT systems can • Neither be exhaustive tested, • Nor be completely verified. The complexity of verifying temporal logic specification is exponential. We can completely verify some modest size systems; or parts of a large and complex system. We even knowing use components with known bugs, e.g., ARINC 653 and certified RTOS. Unless we can build stable plants with provably stable software systems, there is a great danger. University of Illinois at Urbana-Champaign 7
Software Stability: An Example Often, requirements can be decomposed into • Critical (correctness) requirements » Sorting: output numbers in correct order; » TSP: visit every city exactly once » Control: stable and controllable • Performance optimization » Sorting: faster » TSP: shorter path » Control: less time/error/energy Heap Sort Bubble Sort Bounded responses to errors: A stable software system is one that can maintain key properties in spite of errors in non-critical components University of Illinois at Urbana-Champaign 8
Example: Co-Stability Having a reliable controller, we identify the recovery region within which the controller can operate successfully. Decision logic high assurance control subsystem Plant high performance control subsystem The largest recovery region can be found using LMI. This approach is applicable to any linearizable systems. X AX A Q + Q A < 0 og det Q C X < 1 Simplex Architecture State constraints T 1 min l subject to Safety switching rule: T Lyapunov function T X QX < 1 See Using Simplicity to Control Complexity Using Simplicity to Control Complexity www-rtsl.cs.uiuc.edu/~lrs/Simplicity.pdf University of Illinois at Urbana-Champaign 9
Summary Large and complex embedded software has already emerged as a new source of safety hazards in practice. I have list some of the common problems in practice. None of them is easy to solve. The writing is on the wall: Large and complex avionics systems can • Neither be exhaustive tested, • Nor be completely verified. The complexity of verifying temporal logic specification is exponential. Two of the most important issues are: Interaction complexity reduction and co- stability of software and hardware: » Verifiable reduced complexity core and well formed dependency, » Strong isolation mechanisms in time, storage, I/O, network, system service, & plant » Ensure that failure semantics do not violation the precondition of recovery mechanisms » Stability in the controlled plant when the plant is under disturbance and software recovery. University of Illinois at Urbana-Champaign 10