Debugging of Real-Time Embedded Systems: Experiences from SEESCOA

Debugging of Real-Time Embedded Systems: Experiences fromSEESCOA Michiel Ronsse RUG-ELIS

Problem description • embedded systems have become increasingly sophisticated • debugging becomes a major problem • types of bugs • computational errors • synchronisation errors • performance errors • testing and debugging is responsible for a huge part of the development time • Þ definitive need for appropriate debugging tools

Debugging embedded systems • two test/debug phases • during the development • during the actual use • an embedded system • typically runs continuously Þ interactive debugging? • can have hard and soft real-time constraints Þ Heisenbugs • parallel and/or distributed systems: absence of a global clock • uses lots of input Þ repeatability? Þ cyclic debugging? • hardware related problems: code in ROM, limited amount of RAM • Solution: hardware add-ons for on-chip debugging techniques

On-chip debugging techniques (I) • Logic Analyser • ROM monitor • ROM emulator • In-Circuit Emulator • Background Debug Mode • JTAG

On-chip debugging techniques (II) • almost all contemporary embedded processors have add-ons for debugging • as embedded processors get faster, these add-ons get closer to (or in) the processor • these add-ons take up valuable chip area (up to 10%) • these add-ons are also available in the consumer produsts, making on-site testing possible •  hardware manufacturers believe in design for debugability

Embedded software • Nowadays, software engineering methods designed for `business’ applications are also used for embedded systems: • Use of `higher’ languages (C, C++, Java) • Use of reusable components • Multithreaded applications • ... • Debugging/maintaining these complex applications becomes a difficult task Design for Debugability

Debugging software • Most important notion: • Repeatability cyclic debugging • Observability what is going on? • These are two problems for embedded systems: • Non-determinism present in most embedded systems • Low observability (real-time constraints, ...)

Causes of non-determinism • Sequential programs: input (keyboard, disk, network, certain system calls (e.g. gettimeofday(), …) • Parallel programs: race conditions: • two threads • accessing the same shared variable (memory location) • in an unsynchronised way • and at least one thread modifies the variable

Execution Replay • Goal: make repeated equivalent re-exections possible • Method: two phases • Record phase: record all non-deterministic events during an execution in a trace file • Replay phase: use trace file to produce the same execution • Question: what & where to trace?

Example code #include <pthread.h> unsigned global=5; thread2(){ global=global+6; } thread3(){ global=global+7; } main(){ pthread_t t2,t3; pthread_create(&t2, NULL, thread1, NULL); pthread_create(&t3, NULL, thread2, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); printf(“global=%d\n”, global); }

Possible executions L(5) L(5) L(5) L(5) L(5) A A A A A S(11) S(11) L(11) S(12) S(12) S(11) A S(18) global=11 global=18 global=12

Example code II #include <pthread.h> unsigned global=5; thread2(){lock();global=global+6; unlock();} thread3(){lock();global=global+7; unlock();} main(){ pthread_t t2,t3; pthread_create(&t2, NULL, thread1, NULL); pthread_create(&t3, NULL, thread2, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); printf(“global=%d\n”, global); }

Possible executions II L(11) A L(5) S(18) A L(5) S(11) A L(11) A S(11) S(18) global=18 global=18

Race conditions • Three types of conflicts: • load/store • store/load • store/store • Two types: • synchronisation races: • doesn’t allow the use of cyclic debugging techniques • is not a bug, is desired non-determinism • data races: • doesn’t allow the use of cyclic debugging techniques • is a bug, is undesired non-determinism

Tracing all memory operations • Introduces an intolerable overhead, both in time and space • Due to out-of-order stores, the order in which threads see events can differ • Will replay both synchronization and data races.

Tracing synchronization operations • The only events that are assured to be seen in the same order are synchronisation operations • Synchronization operations are a subset of all memory operations => lower overhead • Will replay only synchronization races. • Will fail if data races occur. • Solution • Record: trace synchronization operations • Replay: check for data races (once)

Record phase Main: 1,2,3,4,5,13,14,... T2: 3,6,7,8,9,... T3: 4,5,6,8,10,11,12,...

Replay phase Main: 1,2,3,4,5,13,14,... T2: 3,6,7,8,9,... T3: 4,5,6,8,10,11,12,...

Implementation • RecPlay for Solaris (SPARC) and Linux (x86) • Uses JiTI for dynamic instrumentation • Record overhead: 1.6%

Instrumenting Java classes

The RecPlay system for Java Machine 1 ... Machine 2 JVM instrumenting classfiles for record or replay Component System JVM Trace I/O

Current Situation • Record/replay for synchronization operations works • To be added: • Race detection • Tracing of input

Debugging of Real-Time Embedded Systems: Experiences from SEESCOA