190 likes | 215 Views
This guide delves into the intricacies of debugging distributed systems, offering insights into fault search, control, and inspection of program runtimes. It discusses requirements, user-friendliness, and specific problems like symbolic debugging and reproducibility. The text also covers topics such as inter-process communication, consistent state representations, Lamport approach, semantics of breakpoints, and handling indeterminism. Special attention is given to strategies for handling information flooding and ensuring consistent stopping of processes amid time delays.
E N D
Debugging of Distributed Systems • Example of a tool for distributed systems • Approach to fault search during testing • Control and inspection of internal program runtime
Debugging of Distributed Systems • Requirements • User-friendliness • Problem-orientation (symbolic Debugging)(String c = „xyz“ instead of „LOC FF2243 AC32...) • Reproducibility (quasi-deterministic) • Presentation of state information(Variables, Registers, Ports etc: „show c“) • Modification of system state(set c = „ABC“) • Supervision mechanisms User Query / Modification Debugger Tested program state information
Special problems • Parallel processing • Indeterminism • Absence of a global state • Absence of a common clock • Interference “Debugger System” • Resulting information flooding • Semantics of special constructs(breakpoint, break conditions) • Improved functionality • (inter-process communication)
Inter-process communication • State information contains in addition to process-/object state also communication state Manipulated intervention preferable • Separation in intra-process layer (conventional) and inter-process-layer (special) • Functionality of the inter-process layer • Access to messages: • insert <m> in <port> • read <m> from <port> • extract <m> from <port> • forward <m> to <port>
Inter-process communication • Break points • set break <port> <mtype> [send | receive] • set break <port1> ... <portn> • Statistic accounting records • Access to operating system objects(Semaphore, Processes)
Consistent state representations • Problem: no common clock and storage • no consistent state representation • Approaches • Clock synchronization (in the range of milliseconds) • Logic arrangement of the events • Basis: Lamport-Approach • Half-order „Pre-Relation“ • Events are ordered by causal context (sending before receiving) • Unordered if events are independent
Consistent state presentations • Rules • a and b in the same process, a before b : ab • a to send, b to receive a message : ab • ab, bc ac (transitively) • All essential events for distributed processing can be ordered(consistent logic “snapshots“)
Lamport-Approach • Realization via the algorithm • each process has event counter Z (initially “Null”) • each inter-process event has a number N(E),as well as the messages ( = N(E)) • Sending: • increment of Z (Z:=Z+1) • marking Sending Event: N(E) := Z • marking message: :=Z • Receiving of message with number • if > Z (Receiver) set Z:= + 1 • otherwise set Z:=Z+1 • Receiving Event N(E) := Z • Intra-process Event: • Z:=Z+1 • N(E) := Z
Lamport-Approach P1 P2 P3 1 2 1 2 3 3 4 5 4 5 6 7 8 9 7 9 10 11 12 12 • Causal events ordered completely • Non-causal events unordered (for instance, Nr.12 within P2 and P3)
Semantics of breakpoints • Problem: When does a break point satisfy distributed conditions? • Approach: • simple predicates (a process, „call proc“) • disjunctive predicates („P1: call proc | P2: call xy“) • subjunctive predicates („P1: call proc & P1: x=1“) only a process inside • joint predicates: coupling of events in pre-relation: t11 t12 Process 1 t31, t22 : ordered t21 t22 t23 Process 2 t11, t21 : unordered Process 3 t31 t32 t33
Consistent stopping of processes • Problem:Time delay after issuing of a halt-command • Approach:Backtracking to consistent state directly before a stopping event („reset line“) • Procedure:Backtracking of the causal contexts regarding to the pre-relation of messages t11 t12 t13 t14 Process 1 t12: stop point event Process 2: Backtracking on t23 Process 3: Backtracking on t32 t21 t22 t23 t24 Process 2 Process 3 t31 t32 t33 t34
Distributed trace-steps • Basis:Step-Mode from sequential Debuggers (interactive) • one trace-step means movement up to the next point (inter-process event) • local calculations build a entity • sending operations are carried out on all participating processes • receiving operations only if a message exists (as the case may be after sending step) Distributed trace-steps 1 2 Calculation phase 3 Interaction point
Indeterminism handling • Indeterministic program behavior: race conditions • Decisions: • Testing of different possible execution sequences via distributed Single Step • Re-execution / Replay via output recording • Approach: • recording of all inter-process events • control of repeated execution based on this (Re-execution) • high storage requirements but reduction via check points without precedent events • Replay also to a single process possible(important also in the technical processes)
Handling of information flooding • Requirement:Recorded / output information to be reduced • Limitation on inter-process events • Limitation on relevant time intervals • Abstraction forms for • process groups • execution (Timing-Diagram) • ports (abstract message flow) • Graphics support(control windows, animation tools)
Distributed debugging: concepts • Hierarchized influencing • Level 1 : „Free runtime“ • no modification, only trace-recording • minimal interference • Level 2 : „Self-responsibility“ • freely modifiable execution • strong interference • full responsibility of the tester for execution control • Level 3 : „Pseudo-Real-time“ • “the best possible compensation for strong interference” • “private clock” per process • “private clock” runs, except in the Debugger-Code • “private clock” synchronized via, for instance, Lamport-Algorithm on partial order
Architecture principles • Alternatives: • 1. Separate processes: Program / Debugger • 2. Separate processes with common data (also lightweight processes) • 3. Integrated processes with direct instrumentation as a rule alternative 2 or 3 are most common
Architecture proposal Computer A Process 1 local debugging control Centralized dialogue process Process 2 Computer B Process 3 local debugging control Process 4