280 likes | 506 Views
Debugging Components. Koen De Bosschere RUG-ELIS. Problem description. Components are loosely coupled and do not have a common notion of time Components have contracts (e.g. timing contracts) Components are activated asynchronously by the scheduler
E N D
Debugging Components Koen De Bosschere RUG-ELIS
Problem description • Components are loosely coupled and do not have a common notion of time • Components have contracts (e.g. timing contracts) • Components are activated asynchronously by the scheduler • Components can be replaced at run-time Traditional debugging techniques are not adequate
Traditional debugging inadequate? • Execution is non-deterministic: no two runs can be guaranteed to be identical (scheduling, timing differences, replacing components,…): cyclic debugging not applicable • Timing is part of correctness: the intrusion caused by the debugger might violate the contracts • Input might not be repeatable if generated by an external device (e.g. camera or microphone) Debugging is often a matter of trial and error, and a good portion of luck and experience is needed; the use of multithreading only adds to that.
Two approaches • On-chip debugging techniques • Software debugging techniques
On-chip debugging techniques • Logic Analyser • ROM monitor • ROM emulator • In-Circuit Emulator • Background Debug Mode • JTAG These add-ons take up valuable chip area (up to 10%) Hardware manufacturers believe in design for debugability
Software debugging techniques • Execution must be repeatable to allow for cyclic debugging • Program flow must be identical • Input must be identical • Execution must be observable to allow for debugging • We must be able to use breakpoints, watch points, etc. without altering the program flow Re-execution must be deterministic
Example code class G { public static int global = 5; } class Thread1 extends Thread { public void run() { G.global += 2; } } class Thread2 extends Thread { public void run() { G.global *= 3; } } class Main { public static void main(String [] args) { Thread1 t1 = new Thread1(); Thread2 t2 = new Thread2(); G.global = 5; t1.start(); t2.start(); t1.join(); t2.join(); System.out.println(“global” + G.global); } }
Possible executions G.global=5 G.global=5 G.global=5 G.global=5 L(5) L(5) L(5) L(5) *3 L(5) L(5) +2 +2 +2 S(15) *3 *3 S(7) S(7) L(15) L(7) S(15) S(15) +2 S(7) *3 S(17) S(21) G.global=15 G.global=7 G.global=21 G.global=17
Causes of non-determinism • Sequential programs: • Input • Certain system calls (time) • … • Parallel programs: • Race conditions on shared variables, • Load balancing • …
Execution Replay • Goal: make repeated equivalent re-executions possible • Method: two phases • Record phase: record all non-deterministic events during an execution in a trace file • Replay phase: use trace file to produce the same execution • Question: what & where to trace? • Synchronization Replay • Input Replay • Data race detection
Requirements execution replay • Record must have low intrusion • Replay must be accurate • Record phase must be space efficient • Replay phase must be time efficient
Synchronization Replay Execution 1 Execution 2 Trace file record replay (happens before relation)
Input replay application IO-instructions System calls kernel
Example code class G { public static int global = 5; public static Object s = new Object(); } class Thread1 extends Thread { public void run() { synchronized(G.s){G.global += 2;}} } class Thread2 extends Thread { public void run() { synchronized(G.s){G.global *= 3;}} } class Main { public static void main(String [] args) { Thread1 t1 = new Thread1(); Thread2 t2 = new Thread2(); G.global = 5; t1.start(); t2.start(); t1.join(); t2.join(); } }
Possible executions G.global=5 G.global=5 G.global=5 L(5) L(5) L(5) L(5) *3 L(5) +2 +2 S(15) *3 *3 S(7) L(15) S(15) S(15) +2 S(7) S(17) G.global=15 G.global=7 G.global=17 G.global=5 L(5) +2 S(7) L(7) *3 S(21) G.global=21
Record phase 1 G.global=5 3,4,5,6 3,7,8,9 2 4,6,7,8 4,5,6,7 3 3 4 5 L(5) *3 S(15) 6 7 L(15) +2 S(17) 8 9 10 7 11 12 G.global=17 1,2,3,7,9,10 1,2,3,10,11,12 1 G.global=5 2 3 3 4 4 L(5) +2 S(7) 6 5 L(7) *3 6 S(21) 7 7 9 8 G.global=21 10
Replay phase G.global=5 G.global=5 1 1 2 2 3 3 3 3 4 4 4 L(5) +2 S(7) 5 5 L(5) *3 S(15) 6 6 6 L(7) *3 S(21) 7 7 7 7 L(15) +2 S(17) 8 8 9 9 10 G.global=21 10 11 12 G.global=17 1 2 3 4 5 6 7 8 9 10 11 12
Execution Replay in Java • Requires to record the choices made by synchronization constructs like synchronized, wait, signal, etc. • During replay, the synchronization operations are replaced by operations waitforlogicaltime(t). T component system
Input Replay • Execution will only yield the same results if the input is repeatable too • Solution: recording input by capturing all I/O events and regenerating them during replay • Input replay generates a huge amount of data…
Data race detection G.global=5 G.global=5 L(5) L(5) L(5) L(5) +2 +2 *3 *3 S(7) S(15) S(15) S(7) G.global=15 G.global=7 • Data race occurs if a store/store, load/store or store/load occurs between two threads in parallel on the same location. • Automatic data race detection: check data race condition on all load/store pairs that are not ordered.
Implementation • RecPlay for Solaris (SPARC) and Linux (x86) • Uses JiTI for dynamic instrumentation • Record overhead: 1.6% • JaReC for Java (on top of the JVM) • Uses JVMPI for dynamic instrumentation • Record overhead: 25% on average • Input-Replay for Linux (Tornado) • Uses ptrace
Performance modeling JVM • Java workload separable in different components • Virtual Machine (SUN, IBM, JikesRVM, JRockit, …) • Java application (SPECjvm98, SPECjbb2000, …) • Input to the application • Measure execution characteristics (AMD Duron) • IPC, branch & cache behavior, … • Statistical analysis • Principal Components Analysis • Cluster Analysis • Quantify difference SPECcpu2000 and Java workloads
JVM Results • Java workloads mostly clustered by • benchmark for large workloads • VM for small workloads • SPECjvm98: small input set not significant for large input set execution behavior • Comparing Java vs. C: • No significant difference IPC, amount of branches, data TLB • Significant difference data cache behaviour, instruction TLB, return stack usage
Conclusions • Debugging multithreaded/distributed systems is not an easy task • Faithful record/replay requires extra resources (time + space) • Record/replay enables the developer the effectively debug a complex multithreaded program • The choice of Java VM has an impact on the low-level behavior of the processor. Java benchmarks should be large enough to be realistic.
Output • 14 refereed conference papers (OOPSLA, ParCo, WBT,…) • 12 workshop papers • 5 journal publications (FGCS, CACM, Parallel Computing,…) • 1 PhD • 12 master theses • Java and Embedded Systems Symposium Nov 2002 [150 people] • AADEBUG 2003 workshop, Sept 2003 [60 people]