Debugging of Distributed Systems

Debugging of Distributed Systems

Debugging of Distributed Systems • Example of a tool for distributed systems • Approach to fault search during testing • Control and inspection of internal program runtime

Debugging of Distributed Systems • Requirements • User-friendliness • Problem-orientation (symbolic Debugging)(String c = „xyz“ instead of „LOC FF2243 AC32...) • Reproducibility (quasi-deterministic) • Presentation of state information(Variables, Registers, Ports etc: „show c“) • Modification of system state(set c = „ABC“) • Supervision mechanisms User Query / Modification Debugger Tested program state information

Special problems • Parallel processing • Indeterminism • Absence of a global state • Absence of a common clock • Interference “Debugger  System” • Resulting information flooding • Semantics of special constructs(breakpoint, break conditions) • Improved functionality • (inter-process communication)

Inter-process communication • State information contains in addition to process-/object state also communication state Manipulated intervention preferable • Separation in intra-process layer (conventional) and inter-process-layer (special) • Functionality of the inter-process layer • Access to messages: • insert <m> in <port> • read <m> from <port> • extract <m> from <port> • forward <m> to <port>

Inter-process communication • Break points • set break <port> <mtype> [send | receive] • set break <port1> ... <portn> • Statistic accounting records • Access to operating system objects(Semaphore, Processes)

Consistent state representations • Problem: no common clock and storage •  no consistent state representation • Approaches • Clock synchronization (in the range of milliseconds) • Logic arrangement of the events • Basis: Lamport-Approach • Half-order „Pre-Relation“ • Events are ordered by causal context (sending before receiving) • Unordered if events are independent

Consistent state presentations • Rules • a and b in the same process, a before b : ab • a to send, b to receive a message : ab • ab, bc  ac (transitively) •  All essential events for distributed processing can be ordered(consistent logic “snapshots“)

Lamport-Approach • Realization via the algorithm • each process has event counter Z (initially “Null”) • each inter-process event has a number N(E),as well as the messages ( = N(E)) • Sending: • increment of Z (Z:=Z+1) • marking Sending Event: N(E) := Z • marking message: :=Z • Receiving of message with number  • if  > Z (Receiver) set Z:= + 1 • otherwise set Z:=Z+1 • Receiving Event N(E) := Z • Intra-process Event: • Z:=Z+1 • N(E) := Z

Lamport-Approach P1 P2 P3 1 2 1 2 3 3 4 5 4 5 6 7 8 9 7 9 10 11 12 12 • Causal events ordered completely • Non-causal events  unordered (for instance, Nr.12 within P2 and P3)

Semantics of breakpoints • Problem: When does a break point satisfy distributed conditions? • Approach: • simple predicates (a process, „call proc“) • disjunctive predicates („P1: call proc | P2: call xy“) • subjunctive predicates („P1: call proc & P1: x=1“) only a process inside • joint predicates: coupling of events in pre-relation: t11 t12 Process 1 t31, t22 : ordered t21 t22 t23 Process 2 t11, t21 : unordered Process 3 t31 t32 t33

Consistent stopping of processes • Problem:Time delay after issuing of a halt-command • Approach:Backtracking to consistent state directly before a stopping event („reset line“) • Procedure:Backtracking of the causal contexts regarding to the pre-relation of messages t11 t12 t13 t14 Process 1 t12: stop point event Process 2: Backtracking on t23 Process 3: Backtracking on t32 t21 t22 t23 t24 Process 2 Process 3 t31 t32 t33 t34

Distributed trace-steps • Basis:Step-Mode from sequential Debuggers (interactive) • one trace-step means movement up to the next point (inter-process event) • local calculations build a entity • sending operations are carried out on all participating processes • receiving operations only if a message exists (as the case may be after sending step) Distributed trace-steps 1 2 Calculation phase 3 Interaction point

Indeterminism handling • Indeterministic program behavior: race conditions • Decisions: • Testing of different possible execution sequences via distributed Single Step • Re-execution / Replay via output recording • Approach: • recording of all inter-process events • control of repeated execution based on this (Re-execution) • high storage requirements but reduction via check points without precedent events • Replay also to a single process possible(important also in the technical processes)

Handling of information flooding • Requirement:Recorded / output information to be reduced • Limitation on inter-process events • Limitation on relevant time intervals • Abstraction forms for • process groups • execution (Timing-Diagram) • ports (abstract message flow) • Graphics support(control windows, animation tools)

Distributed debugging: concepts • Hierarchized influencing • Level 1 : „Free runtime“ • no modification, only trace-recording • minimal interference • Level 2 : „Self-responsibility“ • freely modifiable execution • strong interference • full responsibility of the tester for execution control • Level 3 : „Pseudo-Real-time“ • “the best possible compensation for strong interference” • “private clock” per process • “private clock” runs, except in the Debugger-Code • “private clock” synchronized via, for instance, Lamport-Algorithm on partial order

Architecture principles • Alternatives: • 1. Separate processes: Program / Debugger • 2. Separate processes with common data (also lightweight processes) • 3. Integrated processes with direct instrumentation  as a rule alternative 2 or 3 are most common

Architecture proposal Computer A Process 1 local debugging control Centralized dialogue process Process 2 Computer B Process 3 local debugging control Process 4

Debugging of Distributed Systems

Debugging of Distributed Systems

Presentation Transcript

Introduction of Distributed Systems

Distributed Debugging

D 3 S: Debugging Deployed Distributed Systems

Performance Debugging for Distributed Systems of Black Boxes

Replay Debugging for Distributed Application

Federated Distributed Systems: Concepts of Distributed Systems (1)

D 3 S: Debugging Deployed Distributed Systems

Performance Debugging for Distributed Systems of Black Boxes

Performance Debugging for Distributed Systems of Black Boxes

declarative distributed debugging

CHARACTERIZATION OF DISTRIBUTED SYSTEMS

Performance of Distributed Systems

Performance Debugging for Distributed Systems of Black Boxes

Debugging Integrated Systems: An Ethnographic Study of Debugging Practice

Distributed Systems Course Distributed Multimedia Systems

Distributed Systems Course Distributed File Systems

Conformance of Distributed Systems

Characterization of Distributed Systems

Performance Debugging for Distributed Systems of Black Boxes

Classification of Distributed Systems Properties of Distributed Systems

Distributed Systems Course Distributed File Systems

Replay Debugging for Distributed Application