260 likes | 846 Views
Debugging of Distributed Systems. Debugging of Distributed Systems. Example of a tool for distributed systems Approach to fault search during testing Control and inspection of internal program runtime. Debugging of Distributed Systems. Requirements User-friendliness
E N D
Debugging of Distributed Systems • Example of a tool for distributed systems • Approach to fault search during testing • Control and inspection of internal program runtime
Debugging of Distributed Systems • Requirements • User-friendliness • Problem-orientation (symbolic Debugging)(String c = „xyz“ instead of „LOC FF2243 AC32...) • Reproducibility (quasi-deterministic) • Presentation of state information(Variables, Registers, Ports etc: „show c“) • Modification of system state(set c = „ABC“) • Supervision mechanisms User Query / Modification Debugger Tested program state information
Special problems • Parallel processing • Indeterminism • Absence of a global state • Absence of a common clock • Interference “Debugger System” • Resulting information flooding • Semantics of special constructs(breakpoint, break conditions) • Improved functionality • (inter-process communication)
Inter-process communication • State information contains in addition to process-/object state also communication state Manipulated intervention preferable • Separation in intra-process layer (conventional) and inter-process-layer (special) • Functionality of the inter-process layer • Access to messages: • insert <m> in <port> • read <m> from <port> • extract <m> from <port> • forward <m> to <port>
Inter-process communication • Break points • set break <port> <mtype> [send | receive] • set break <port1> ... <portn> • Statistic accounting records • Access to operating system objects(Semaphore, Processes)
Consistent state representations • Problem: no common clock and storage • no consistent state representation • Approaches • Clock synchronization (in the range of milliseconds) • Logic arrangement of the events • Basis: Lamport-Approach • Half-order „Pre-Relation“ • Events are ordered by causal context (sending before receiving) • Unordered if events are independent
Consistent state presentations • Rules • a and b in the same process, a before b : ab • a to send, b to receive a message : ab • ab, bc ac (transitively) • All essential events for distributed processing can be ordered(consistent logic “snapshots“)
Lamport-Approach • Realization via the algorithm • each process has event counter Z (initially “Null”) • each inter-process event has a number N(E),as well as the messages ( = N(E)) • Sending: • increment of Z (Z:=Z+1) • marking Sending Event: N(E) := Z • marking message: :=Z • Receiving of message with number • if > Z (Receiver) set Z:= + 1 • otherwise set Z:=Z+1 • Receiving Event N(E) := Z • Intra-process Event: • Z:=Z+1 • N(E) := Z
Lamport-Approach P1 P2 P3 1 2 1 2 3 3 4 5 4 5 6 7 8 9 7 9 10 11 12 12 • Causal events ordered completely • Non-causal events unordered (for instance, Nr.12 within P2 and P3)
Semantics of breakpoints • Problem: When does a break point satisfy distributed conditions? • Approach: • simple predicates (a process, „call proc“) • disjunctive predicates („P1: call proc | P2: call xy“) • subjunctive predicates („P1: call proc & P1: x=1“) only a process inside • joint predicates: coupling of events in pre-relation: t11 t12 Process 1 t31, t22 : ordered t21 t22 t23 Process 2 t11, t21 : unordered Process 3 t31 t32 t33
Consistent stopping of processes • Problem:Time delay after issuing of a halt-command • Approach:Backtracking to consistent state directly before a stopping event („reset line“) • Procedure:Backtracking of the causal contexts regarding to the pre-relation of messages t11 t12 t13 t14 Process 1 t12: stop point event Process 2: Backtracking on t23 Process 3: Backtracking on t32 t21 t22 t23 t24 Process 2 Process 3 t31 t32 t33 t34
Distributed trace-steps • Basis:Step-Mode from sequential Debuggers (interactive) • one trace-step means movement up to the next point (inter-process event) • local calculations build a entity • sending operations are carried out on all participating processes • receiving operations only if a message exists (as the case may be after sending step) Distributed trace-steps 1 2 Calculation phase 3 Interaction point
Indeterminism handling • Indeterministic program behavior: race conditions • Decisions: • Testing of different possible execution sequences via distributed Single Step • Re-execution / Replay via output recording • Approach: • recording of all inter-process events • control of repeated execution based on this (Re-execution) • high storage requirements but reduction via check points without precedent events • Replay also to a single process possible(important also in the technical processes)
Handling of information flooding • Requirement:Recorded / output information to be reduced • Limitation on inter-process events • Limitation on relevant time intervals • Abstraction forms for • process groups • execution (Timing-Diagram) • ports (abstract message flow) • Graphics support(control windows, animation tools)
Distributed debugging: concepts • Hierarchized influencing • Level 1 : „Free runtime“ • no modification, only trace-recording • minimal interference • Level 2 : „Self-responsibility“ • freely modifiable execution • strong interference • full responsibility of the tester for execution control • Level 3 : „Pseudo-Real-time“ • “the best possible compensation for strong interference” • “private clock” per process • “private clock” runs, except in the Debugger-Code • “private clock” synchronized via, for instance, Lamport-Algorithm on partial order
Architecture principles • Alternatives: • 1. Separate processes: Program / Debugger • 2. Separate processes with common data (also lightweight processes) • 3. Integrated processes with direct instrumentation as a rule alternative 2 or 3 are most common
Architecture proposal Computer A Process 1 local debugging control Centralized dialogue process Process 2 Computer B Process 3 local debugging control Process 4