Applying RAMS to Design of Safety- and Mission-Critical Java Standards

Applying RAMS to Design of Safety- and Mission-Critical Java Standards Kelvin Nilsen, Ph.D., CTO

The “Lost Art” of Programming Language Design • Brief History of Programming Language Introductions • 1957: Fortran • 1959: LISP, COBOL • 1960: Algol • 1962: APL, SNOBOL • 1972: C, Prolog • 1975: Pascal, Basic, Scheme • 1980: Smalltalk-80, Modula-2, C++ precursors • 1983: Ada • 1986: C++, Smalltalk-V Expressive Power (enable writing of code) Abstraction and Encapsulation (make code readable) Object Orientation, Maintenance (make code extensible) Where are we today with real-time Java?

What is Real Time? • Soft Real-Time sometimes connotes uncertainty regarding deadlines, resource requirements, budgeting, and enforcement. Here, we assume it means: • Awareness of resource requirements and timing constraints • Disciplined approach to allocating resources so as to satisfy timing constraints • Resource needs analysis, budgeting, and enforcement may use empirical and/or heuristic techniques • Hard Real-Time means resource needs are determined analytically and budgets enforced algorithmically, guaranteeing 100% compliance with timing constraints.

Our Focus • In this talk, we are specifically considering DO-178B levels A, B, and C. • For levels A and B, we are primarily considering hard real-time problems • RAMS is a European Space Agency (ESA) term representing Reliability, Availability, Maintainability, and Safety • ESA expects subcontractors to apply RAMS principles in all software developed for use within the European space program • Note that developers of “real-time software” are responsible for analyzing and proactively managing more details than traditional IT developers (i.e. memory, CPU time, blocking times, and real time)

Not RAMS, But Also Very Relevant • Generality • java.util.collections and other standard Java APIs • Interrupt handling • Efficiency • Is the technology relevant and interesting to more than a small specialized niche? • Efficiency • How much memory required? • How much CPU performance required? • How long will it run on a single battery? • How many cents does it add to the cost of each unit manufactured?

Preliminary Performance Metrics for JRTK Note: most other real-time Java technologies (including PERC) run slower than traditional Java

Reliability Concerns • Traditional Java does not fully constrain the initialization of “global” (static) variables, so that initial (and final) values of these variables may depend on various race conditions • Can we be sure that global variables are used before they are initialized? • Can we be sure that global “final” variables are constants? • The Ravenscar Java proposal suggests that during initialization, we apply different operating semantics than during the “mission-phase” • How long will it take to complete initialization? • How much memory will be required to perform initialization? • How much memory will be available for execution of mission code after initialization has completed?

Reliability Solutions • The Scalable Java proposal requires that: • All initialization of global variables adhere to particular style guidelines that allow the initialization expressions to be analyzed prior to run time • “Intelligent linking” tools precompute the initial values of (most) global variables so that these initial values are encoded as part of the load image, and placed in ROM for final variables. • Static analysis tools enforce that “every” static variable is initialized before use, and that initialization expressions are free of race conditions and circular dependencies.

Reliability Concerns • The RTSJ programming abstractions are difficult to use and error prone. Consider: • MemoryAccessError thrown if NoHeapRealtimeThread fetches a reference to heap memory • IllegalAssignmentError if RealtimeThread attempts to write heap reference into a ScopedMemory object • IllegalAssignmentError if RealtimeThread or NoHeapRealtimeThread attempts to write ScopedMemory reference to ImmortalMemory object • IllegalAssignmentError if RealtimeThread or NoHeapRealtimeThread attempts to write reference to inner-nested ScopedMemory object to an outer-nested ScopedMemory object • Note: code that is “perfectly valid” in one context is considered erroneous in other contexts, but there are no syntactic markers on the code to indicate where it is valid

Reliability Concerns • More RTSJ run-time exceptions: • InaccessibleAreaException: if I attempt to allocate memory within an area that this thread has not “entered” • MemoryScopeException: if a wait-free queue is constructed with the ends of the queue in “inappropriate” areas • ScopedCycleException: if a thread’s attempt to enter a memory scope would violate the single-parent rule • ThrowBoundaryError: if a thrown exception attempts to propagate beyond departure from the scope within which it was allocated • In an ideal world, programmers would not be allowed to write programs that are vulnerable to these problems, but enforcement of this guideline is “intractable”

Reliability Solutions • The Scalable Java proposal forbids all behaviors that could result in MemoryAccessError, InaccessibleAreaException, MemoryScopeException, and ScopedCycleException. • The Scalable Java proposal requires syntactic markers and checked exception handling in the rare contexts that might possibly throw IllegalAssignmentError. Further, it enables system managers to forbid code that might throw IllegalAssignmentError, and to enforce prohibition with standard tools. • Open issue: The Scalable Java proposal may prohibit throwing of scoped exceptions, or may establish standards for tools that would provide project-specific optional enforcement of this restriction.

Reliability Concerns • The RTSJ does not guarantee constant-time entry into a new inner-nested LTMemory scope, and does not guarantee that there will exist sufficient defragmented memory to allow entry into that scope • The RTSJ is not able to guarantee that a previously entered, exited, and subsequently re-entered memory scope is in its virgin state when the scope is re-entered. Consequently, reliability of subsequent allocations performed within the scope is compromised

Reliability Solution • The Scalable Java proposal requires LIFO entry and exit of memory scopes and forbids overriding of Object’s finalize() method. This guarantees: • absence of fragmentation, and • constant-time entry and exit of each nested memory scope, and • all newly entered scopes are in their virgin state

Reliability Concerns • The RTSJ provides no standard mechanisms for determining worst-case CPU time or memory requirements • Ad hoc techniques require significant tedious effort that is prone to human error • Code that is “properly configured” for particular execution environment will not run properly in other environments • If resource needs are underestimated, system reliability suffers

Reliability Solution • The Scalable Java proposal introduces standard annotations to enable static analysis of memory and CPU time requirements for particular software components • System managers can use standard tools to enforce that particular components are fully analyzable • Open issue: must every compliant implementation of the Scalable Java “standard” provide CPU-time and memory analyzers?

General Strategies to Improve Availability • Improved reliability extends MTBF • Reliability issues already discussed • Minimize downtime: • Fast, deterministic restart • Fast, deterministic reconfiguration • Hot-swap failed hardware components (requires support for dynamic reconfiguration of software, device drivers) • Support for redundant computation

Availability Concerns • Some proposals for safety-critical Java suggest that “initialization” involves dynamic class loading, byte-code verification, JIT compilation, garbage collection, etc. • This is neither fast, nor deterministic

Availability Solutions • The Scalable Java proposal enforces that initialization order be fully deterministic • Byte-code verification rejects program components that introduce circularity dependencies, and rejects components that fail to initialize static globals • Most initialization is performed at link time, and initial values are stored in the in the load image • There is no garbage collection during initialization – initialization code implements same virtual machine model as mission code

Availability Concerns (Level C or lower) • In the case that system field maintenance requires replacement of certain hardware devices, some proposals for safety-critical Java require that the effort to “rebuild the kernel” involve cumbersome trial-and-error experimentation with source code and/or searches for new static analysis techniques in order to “certify” that the revised system configuration satisfies the definition of “legal program” • “Regarding only the memory management, a program that can be proven not to contain memory-related runtime errors is such a "legal safety-critical Java program". This definition leaves open what tools may be used for this analysis, since very different analysis techniques exist. These techniques have very different characteristics with respect to the degree to which the tools can operate automatically and the accuracy of the results.” – Fridtjof Siebert, Feb. 2, 2005 • This is neither fast, nor deterministic

Availability Solutions • The Scalable Java proposal provides a definition of “legal program” that is enforced by the byte-code verifier of every compliant implementation • Enforcement of these rules is modular, in the traditional sense of object orientation • Method implementations are independently verified to conform with their declared interfaces • Method compositions (invoking one method from another) are verified compatible by examining only the interface declarations • A verified Scalable Java device driver that satisfies the interface requirements defined for the device driver is guaranteed to integrate cleanly into an existing safety-critical system

Availability Concerns • In many high-availability systems, it is necessary to hot-swap failed hardware components. In the most general case, this requires that we: • Unload the classes that represent the device driver for the failed components • Load new classes that represent the device driver for the replacement component • Problem: the RTSJ and some proposals for safety-critical Java don’t specifically allow class unloading and don’t require support for deterministic and reliable integration of new classes

Availability Solutions • We are designing into the Scalable Java proposal the ability to dynamically unload classes that have been loaded into scoped memory with custom class loaders • The Scalable Java proposal allows deterministic loading and reliable integration of independently verified classes into a running system

Availability Concerns • The redundancy of computation and information that is required to achieve high availability in the face of occasional hardware failures requires network communication, but the memory model restrictions proposed by certain safety-critical Java proposals make it very difficult if not impossible to implement network stacks, RMI, or CORBA. • In some proposals, temporary objects must be “periodically” destroyed. How do we represent domain name server caches, routing tables, RMI handles, temporaries for serialization and deserialization, etc?

Availability Solutions • The Scalable Java proposal supports reliable and very efficient allocation and reclamation of temporary objects using stacked scope abstractions • We are designing into the Scalable Java proposal the ability to support statically analyzable collections (to represent loaded classes, domain name server information caches, routing data structures, etc.)

Availability Concerns • In fault tolerant systems, it is sometimes necessary to migrate redundant computations to new computation servers when particular servers or networking infrastructure fails, but certain proposals for safety-critical Java do not support reliable deterministic class loading and unloading

Availability Solutions • As discussed above, the Scalable Java proposal will support both dynamic class loading and dynamic class unloading

Maintainability • Maintenance consists of activities such as: • Minor modifications to existing software in order to fix a bug, improve performance, or add incremental new functionality • Combining an existing collection of software with an independently developed separate collection of software to yield a new composite system that combines the capabilities of both of the smaller systems • Porting an existing software system to a new OS or CPU platform

Maintainability • Maintainability is primarily an economic consideration, but there is subtle interplay with reliability, availability, and safety: • What is the probability that a maintenance action will reduce system reliability? • If maintenance upgrades are required (e.g. space shuttle), how long will this impact availability to fulfill mission objectives? • Can the impact of an incremental maintenance change be addressed with an incremental change to safety certification artifacts, or will I need to completely recertify all aspects of the system? • Clear encapsulation of control and data, properly abstracted by interface definitions • Portability

Maintainability Concerns • Some proposals for safety-critical Java suggest that the definition of “legal program” depends on which tools you use from which vendor to analyze your program. • Inherent in this approach: the programmer cannot proactively create “legal programs”. He must discover which code is legal by simply trying to “compile” it. • Furthermore, code that is “legal” in one context will be illegal in others. • This makes it very difficult to port a large system to a new tool set, or to make incremental refinements to an existing system, or to combine one complex subsystem with another

Maintainability Solutions • The Scalable Java proposal carefully defines the notion of “legal program” in terms that a programmer can readily understand • Enforcement of legality is performed one method at a time, so programmers receive immediate feedback if they write code that is considered illegal • It is straightforward to determine through examination of interface declarations (and annotations) whether it is appropriate to invoke particular methods from specific contexts

Maintainability of Object-Oriented Code • The motivation for object orientation is to combine strong encapsulation of control and data with ease of extensibility through polymorphism and inheritance • Critical to satisfying these objectives: all of the semantic information required to determine whether particular components compose must be readily available through examination of the interface declarations

Maintainability Concerns • If I am called upon to “modify” an existing RTSJ component (method), it is essential that I understand: • Whether my incoming reference arguments point to immortal or scoped memory • If the incoming scoped arguments are known to nest in a particular order • Which externally created ScopedMemory region sizes must be adjusted if I need to allocate new scope-compatible temporary objects, or if I find it possible to decrease my need for temporary object allocations • Which worst-case CPU time “calculations” must be recomputed if my modifications alter the execution time of this method • Unfortunately, none of this information is represented in the interface specification • Furthermore, the “technique” of searching for all invocations of this method, studying the contexts from which the method is invoked, and tracing the possible global impact of any changes made in each of those contexts, scales exponentially with program size

Maintainability Concerns • If I am called upon to combine two independently developed components (invoke one method from another), it is essential that I understand: • Whether I can “safely” pass references to scoped objects • Whether I can “safely” pass references to ImmortalMemory objects • Whether the invoked method might perform operations that would cause the thread to block • Whether the invoked method might perform ImmortalMemory allocation • Whether the invoked method might perform temporary memory allocations in the current “scope” • Whether the invoked method is known to execute in bounded time and memory • Unfortunately, none of this information is represented in the interface specification

Maintainability Concerns • If I am called upon to extend an existing class, overriding one of the existing methods, it is essential that I understand: • Whether I can assume that my incoming reference arguments point to immortal memory, scoped memory • Whether I can assume that incoming scoped arguments are known to nest in a particular order • Whether I am allowed to allocate memory in the current scope, in ImmortalMemory, in a newly created scope • Whether I am allowed to invoke services that might cause the current thread to block • Whether I am required to restrict myself to control structures that are bounded in execution time • Which worst-case CPU time “calculations” must be recomputed if my new method has different execution time than the overridden method • Which ScopedMemory sizes need to be adjusted if my new method allocates different amounts of memory than the overridden method • Unfortunately, none of this information is represented in the overridden method’s interface specification

Maintainability Solutions • The Scalable Java proposal introduces standard annotations to represent the required information in method interface specifications • The Scalable Java proposal requires byte-code verification tools to assure consistency between interface description and method implementation, and between interface requirements and interface invocations

Safety • Note that the “safest” airplane, is the one that never leaves the hangar, but this is not … • Very reliable in fulfilling its mission • Very available for doing useful work • Sometimes, I wonder if our safety-critical Java standardization efforts will succeed only in this regard.

Safety Fundamentals • In many regards, safety is a combination of reliability and availability (once you’re in the air) • However, it is special in that regulatory authorities impose certain certification practices that must be followed • Look specifically at DO-178B Level A certification requirements

Traceability Analysis • All high-level requirements map to low-level requirements • All low-level requirements map to design, source code, and test plan • Trace each line of source code to corresponding object code • Run all tests and perform coverage analysis • If object code is not 100% covered, you’ve got dead code, unstated requirements, or incomplete test plan. Fix the problem! • Note: test plan must derive from high-level requirements, not language semantics (more later)

Level-A Testing Requirements • Code coverage analysis must be performed on machine language translation of programs • Must provide MCDC (multiple-condition decision coverage) • every condition in a decision in the program has taken all possible outcomes at least once, • every decision in the program has taken all possible outcomes at least once, and • each condition in a decision has been shown to independently affect that decision’s outcome. A condition is shown to independently affect a decision’s outcome by varying just that condition while holding fixed all other possible conditions.

Sample Program (in C) /* Return the maximum of its 4 integer arguments */int max(int a, int b, int c, int d) { if ((a > b) && ((a > c) & (a > d))) return a; else if ((b >= a) && (b >= c) && (b >= d)) return b; else if ((c >= b) && ((c >= a) & (c >= d))) return c; else return d;}

Test Vectors

Control Flow of Test Vectors

Control Flow of Test Vectors Consider branch condition at instructions 16-17. TT and TF demonstrate alternative branches. To satisfy MCDC requirements, we must demonstrate that FT branches differently than TT. But there’s no way to “deliver” this condition to label Lz. This code cannot be tested to Level-A certification requirements.

Safety (Certification) Concerns • How do we perform MCDC testing of assignment checks that, by design, always succeed? • How do we perform MCDC testing of stack overflow tests that we’ve (hopefully) arranged to always fail? • How do we perform MCDC testing of class initialization tests that always succeed following the first test for each class, which always fails? • How do we perform MCDC testing of array subscript out-of-bounds checks that always fail? • And so on…

Safety Solutions • The Scalable Java proposal establishes guidelines that enable static analysis tools to guarantee absence of many common error conditions that would need to be tested at run-time in a traditional fully compliant RTSJ implementation • Static initialization is performed by the intelligent static linker rather than by run-time checks • Having proven through static analysis that no run-time checks are necessary, there is no need to emit untestable run-time checks

Summary • Language design must address a breadth of important issues • While ability to achieve safety certification is important, the question of whether this technology appeals to industry users depends also on ease of software development and maintenance, cost of deployment, etc. • Many of these issues are addressed (at least partially) in the Scalable Java proposal

Applying RAMS to Design of Safety- and Mission-Critical Java Standards