120 likes | 133 Views
This article discusses the major difficulties faced in building real-time embedded applications, including handling concurrent events, timing control, and temporal dependence in program behavior. It also explores the challenges of modeling, analyzing, testing, and reproducing non-deterministic and time-dependent behavior in software systems.
E N D
Background • Major difficulties of building real-time embedded applications • handling concurrent events (real-world events occur in parallel) • timing control and temporal dependence in program behavior • asynchronous operations • Non-deterministic operation, Time-dependent behavior, and race condition • difficult to model, analyze, test, and re-produce. • Example:NASA Pathfinder spacecraft • Total system resetsin Mars Pathfinder • An overrun of data collection task a priority inversion in mutex semaphore failure of communication task a system reset. • Took 18 hours to reproduce the failure in a lab replica the problem became obvious and a fix was installed
Background (Cont’d) • Other examples • select(2)/accept(2) Race Condition in TCP Servers of NetBSD • the bug depends on a specific event and is sometimes difficult to reproduce, particularly if the server is very fast and the network is relatively slow. • The Delphi Bug Report 459 • difficult to reproduce the bug since the timing of the two threads (one is being destroyed and one is being created) has to be “right” for it to occur. • it is easy to identify the faults and fix them once the failing sequences are reproduced (or observed). • The failures are rooted in the interaction of multiple concurrent operations/threads and are based on timing dependencies.
Execution/ Instrumentation Execution D. replay/ Instrumentation Execution/ Observation/ Assertion Execution D. replay/ Observation/ Assertion Execution/ Checkpointing/ Msg logging Rollback/ D. replay Deterministic Replay • Can we re-produce the exact execution behavior with additional delays in a controlled environment • the delays may be caused by instrumentation and break points • For multiple purposes: • Test analysis • Debugging • Recovery
deterministic replay real-time execution interrupt_1 interrupt_1 PC=1000 PC=1000 interrupt_2 PC=2000 interrupt_2 PC=2000 Deterministic Replay (Cont’d) • Programs read in the same input values (timer, DAQ, status, etc.) • Interrupts occurs in the same program execution instances • Need to log external events during real-time execution and re-submit the events during replay • recording and replaying stages intrusions time
Testing Analysis and Timing Intrusion • Software quality analysis and test coverage • Instrumentation at source programs • program behavior may be changed due to timing intrusion • test a robotic controller in the target system – hardware and human-in-the loop operations • some solutions : • hardware-based trace collection (Applied Microsystems) • special datalogging, monitoring, and test facility (SVF for NASA ISS) • Apply instrumentation during deterministic replay • if the overhead of logging external events can be minimized
Our Approach -- A Two-stage Instrumentation • Instrumentation based on RTOS -- for context switches, interrupts, events, and task communication • Annotation for device drivers • Synchronize program execution with external events • cannot rely on program counter • an interrupt during a loop (need loop count and program counter) • simulated time • must be adjusted to match with the real execution time • determine when an event occurs • if no data dependence, it can occur at any instance during a block execution • else, need to know the corresponding statement
Software Instruction Counter • Exact instance in program execution • specified by program counter (PC) I/O status changed read I/O check value read I/O check value • Software instruction counter (SIC) -- • incremented when backward jump or procedure call • software or hardware implemented • Has been applied to recovery and debugging
Current Status source program code instrumentation code analyzer execution trace ESIC and replay instrumentation ESIC, system, and event instrumentation target - record environment target - replay environment instrumented program_2 instrumented program_1 PC stamp converter event trace_2 event trace_1
Current Status (Cont’d) • Works for single execution thread in the whole system (vxWork + MPC860) • There are kernel and non-instrumented threads • test analysis of one program in a multitasking environment • debug a program which calls library routines • system calls to RTOS • Can we still reach deterministic replay if the execution of the instrumented thread is interleaved with other threads? • If interrupts (input) thread_1 thread_2, then, both threads must be instrumented instrumented program RTOS semTake() The other thread ISR interrupt semGive()
Current Status (Cont’d) • If interrupts (input) thread_2 and thread_1 thread_2, • thread_1 doesn’t need to be instrumented • however, interrupts can occur while thread_1 is running (I.e. execution is not in the instrumentation region due to a blocked system call or library call) • Solution: • check thread id when an interrupt occurs • if the interrupted instruction is in the instrumentation region, use PC+SIC for replay • else, replay the interrupt just before the call (RTOS or library)
Current Tasks • Tool integration and GUI • Experiments • joystick program with input and timer • DC motor controller with a LabView-based simulator • Applications in JSC • X38 • AERCam • Porting • vxWorks and Suds on MBX860 embedded controller • porting to RT-linux and other platforms • Documentation and dissemination