1 / 18

Debugging of Distributed Systems

Debugging of Distributed Systems. Debugging of Distributed Systems. Example of a tool for distributed systems Approach to fault search during testing Control and inspection of internal program runtime. Debugging of Distributed Systems. Requirements User-friendliness

bacon
Download Presentation

Debugging of Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Debugging of Distributed Systems

  2. Debugging of Distributed Systems • Example of a tool for distributed systems • Approach to fault search during testing • Control and inspection of internal program runtime

  3. Debugging of Distributed Systems • Requirements • User-friendliness • Problem-orientation (symbolic Debugging)(String c = „xyz“ instead of „LOC FF2243 AC32...) • Reproducibility (quasi-deterministic) • Presentation of state information(Variables, Registers, Ports etc: „show c“) • Modification of system state(set c = „ABC“) • Supervision mechanisms User Query / Modification Debugger Tested program state information

  4. Special problems • Parallel processing • Indeterminism • Absence of a global state • Absence of a common clock • Interference “Debugger  System” • Resulting information flooding • Semantics of special constructs(breakpoint, break conditions) • Improved functionality • (inter-process communication)

  5. Inter-process communication • State information contains in addition to process-/object state also communication state Manipulated intervention preferable • Separation in intra-process layer (conventional) and inter-process-layer (special) • Functionality of the inter-process layer • Access to messages: • insert <m> in <port> • read <m> from <port> • extract <m> from <port> • forward <m> to <port>

  6. Inter-process communication • Break points • set break <port> <mtype> [send | receive] • set break <port1> ... <portn> • Statistic accounting records • Access to operating system objects(Semaphore, Processes)

  7. Consistent state representations • Problem: no common clock and storage •  no consistent state representation • Approaches • Clock synchronization (in the range of milliseconds) • Logic arrangement of the events • Basis: Lamport-Approach • Half-order „Pre-Relation“ • Events are ordered by causal context (sending before receiving) • Unordered if events are independent

  8. Consistent state presentations • Rules • a and b in the same process, a before b : ab • a to send, b to receive a message : ab • ab, bc  ac (transitively) •  All essential events for distributed processing can be ordered(consistent logic “snapshots“)

  9. Lamport-Approach • Realization via the algorithm • each process has event counter Z (initially “Null”) • each inter-process event has a number N(E),as well as the messages ( = N(E)) • Sending: • increment of Z (Z:=Z+1) • marking Sending Event: N(E) := Z • marking message: :=Z • Receiving of message with number  • if  > Z (Receiver) set Z:= + 1 • otherwise set Z:=Z+1 • Receiving Event N(E) := Z • Intra-process Event: • Z:=Z+1 • N(E) := Z

  10. Lamport-Approach P1 P2 P3 1 2 1 2 3 3 4 5 4 5 6 7 8 9 7 9 10 11 12 12 • Causal events ordered completely • Non-causal events  unordered (for instance, Nr.12 within P2 and P3)

  11. Semantics of breakpoints • Problem: When does a break point satisfy distributed conditions? • Approach: • simple predicates (a process, „call proc“) • disjunctive predicates („P1: call proc | P2: call xy“) • subjunctive predicates („P1: call proc & P1: x=1“) only a process inside • joint predicates: coupling of events in pre-relation: t11 t12 Process 1 t31, t22 : ordered t21 t22 t23 Process 2 t11, t21 : unordered Process 3 t31 t32 t33

  12. Consistent stopping of processes • Problem:Time delay after issuing of a halt-command • Approach:Backtracking to consistent state directly before a stopping event („reset line“) • Procedure:Backtracking of the causal contexts regarding to the pre-relation of messages t11 t12 t13 t14 Process 1 t12: stop point event Process 2: Backtracking on t23 Process 3: Backtracking on t32 t21 t22 t23 t24 Process 2 Process 3 t31 t32 t33 t34

  13. Distributed trace-steps • Basis:Step-Mode from sequential Debuggers (interactive) • one trace-step means movement up to the next point (inter-process event) • local calculations build a entity • sending operations are carried out on all participating processes • receiving operations only if a message exists (as the case may be after sending step) Distributed trace-steps 1 2 Calculation phase 3 Interaction point

  14. Indeterminism handling • Indeterministic program behavior: race conditions • Decisions: • Testing of different possible execution sequences via distributed Single Step • Re-execution / Replay via output recording • Approach: • recording of all inter-process events • control of repeated execution based on this (Re-execution) • high storage requirements but reduction via check points without precedent events • Replay also to a single process possible(important also in the technical processes)

  15. Handling of information flooding • Requirement:Recorded / output information to be reduced • Limitation on inter-process events • Limitation on relevant time intervals • Abstraction forms for • process groups • execution (Timing-Diagram) • ports (abstract message flow) • Graphics support(control windows, animation tools)

  16. Distributed debugging: concepts • Hierarchized influencing • Level 1 : „Free runtime“ • no modification, only trace-recording • minimal interference • Level 2 : „Self-responsibility“ • freely modifiable execution • strong interference • full responsibility of the tester for execution control • Level 3 : „Pseudo-Real-time“ • “the best possible compensation for strong interference” • “private clock” per process • “private clock” runs, except in the Debugger-Code • “private clock” synchronized via, for instance, Lamport-Algorithm on partial order

  17. Architecture principles • Alternatives: • 1. Separate processes: Program / Debugger • 2. Separate processes with common data (also lightweight processes) • 3. Integrated processes with direct instrumentation  as a rule alternative 2 or 3 are most common

  18. Architecture proposal Computer A Process 1 local debugging control Centralized dialogue process Process 2 Computer B Process 3 local debugging control Process 4

More Related