1 / 20

Using Queries for Distributed Monitoring and Forensics

Using Queries for Distributed Monitoring and Forensics. Atul Singh Rice University. Petros Maniatis Intel Research Berkeley. Timothy Roscoe Intel Research Berkeley. Peter Druschel Max Planck Institute for Software Systems. Building and monitoring a system.

rumor
Download Presentation

Using Queries for Distributed Monitoring and Forensics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Queries for Distributed Monitoring and Forensics Atul Singh Rice University Petros Maniatis Intel Research Berkeley Timothy Roscoe Intel Research Berkeley Peter Druschel Max Planck Institute for Software Systems

  2. Building and monitoring a system • Building a distributed system is a complex undertaking • Select properties • algorithms • implement, deploy • Switch to monitoring the system • Testing, debugging, profiling, tuning • Monitoring is hard, error-prone • Distributed state • Partial faults • Complex interactions • Asynchronous • External factors EuroSys 2006

  3. Monitoring is hard! • Current state of the art: • Manual insertion of “printf” • Bringing logs to one place • Parsing/processing of logs • Scripts (perl/python) • Queries (Astrolabe) • Offline by nature Expose internal state • Ad-hoc, error-prone Probe exposed state • Correlate events • Bridge the semantic gap EuroSys 2006

  4. Declarative systems: building systems via queries Probe the state • Declarative specification via queries • Execution by a distributed query processor • P2[SOSP’05]: a prototype declarative system • Concise specifications • Enables rapid prototyping • We present a monitoring framework for P2 • Flexible introspection • Retains semantics of application • Online execution tracing Expose internals EuroSys 2006

  5. Overview • Introduction • P2 Background • Monitoring framework • Example applications/Performance • Conclusions EuroSys 2006

  6. Dataflow graph R0 Network In Network Out R1 . . route Router A Router B route nextHop nextHop K -> B K -> C K K’ -> D .. K’ -> E .. Example: route operation in P2 route(B,K) :- route(A,K),nextHop(A,D,B), D == K. action :- event, precondition. K Rule strand Join route.A == nextHop.A Select D == K Project Application state nextHop EuroSys 2006

  7. Overview • Introduction • Background • Monitoring framework • Examples applications/Performance • Conclusions EuroSys 2006

  8. r1 Join Selection Project Introspection and Logging • Introspection at three levels • Application state level • Rule level • Dataflow level • Systematic instrumentation • System is built using smaller, re-usable components • Systematic insertion of logging statements • Logging data is in the form of tuples • Retains semantics of application logic • No need for translation EuroSys 2006

  9. Tracing rule executions • We want to step through the execution • Each step corresponds to a rule • Do it in “online” fashion • For rule level tracing • Need to trace tuples • Match output tuple to input • Track tuples as they go over wire Node A Node B r0 r1 x y z w EuroSys 2006

  10. r1 Join Selection Project r1 ruleId input x output y dest. d (1) Tracing rule executions • Matching input and output tuples of a rule • Tap elements at the beginning and end of a rule • Execution tracer: tracks rule executions • Execution records are stored as tuples in exectable x y input output Execution Tracer exec EuroSys 2006

  11. y x A (2) Tracing tuples across wire • Each tuple has a locally unique ID • Tuple ID is sent along with the tuple • Upon receiving, a new tuple is created with different ID • Hooks in the network in/out handling subsystem • A record is created • tuple’s local ID • tuple’s remote ID • Node from which it came from x Network Out A B Network In y B’ tupleTable EuroSys 2006

  12. x z v y A C r1 r0 z x y w B C Putting it all together Node A • Of course in reality, it’s more complicated … • Aborted rule executions • Pipelined rule executions Node B r0 r1 z w x y tupleTable exec tupleTable exec EuroSys 2006

  13. Overview • Introduction • Background • Monitoring framework • Example applications/Performance • Conclusions EuroSys 2006

  14. Example applications (I) • Distributed watchpoints: Trigger an event if true • Possibly trace back/forward • Oscillation of faulty/stale information (route flaps) • Gossiping for stabilization or updates • Inconsistent routing in DHT’s [Pastry, Chord,…] • Each node is responsible for a unique region • Route using distinct paths and check [Bamboo, Secure Routing] EuroSys 2006

  15. Example applications (II) r1 • Online execution profiling: • How much time is spent in each rule? • Where are the bottlenecks? • Which rule is costlier? What operation? • Consistent Snapshots [Chandy-Lamport]: • Snapshot for the routing state • Queries on “snapshots” itself • What is the degree distribution? • How many node-disjoint paths? • No more than 16 rules for any of the above r2 r3 EuroSys 2006

  16. Performance • 21 node Chord overlay in P2 • Monitored node on separate, unloaded machine • Overhead of introspection • CPU (0.98 1.3%), Memory (8MB 13MB) • Consistent distributed snapshot • Other results in the paper % CPU Util. Tx pkts(X1000) Rate (1/#sec) Rate (1/#sec) EuroSys 2006

  17. Related Work • Management using database techniques [Hy+…] • Performance debugging [Magpie, Causeway…] • Configuration debugging for BGP, OSes [Time-travel…] • Distributed debuggers [WiDS, Pip, Replay Debugging…] • Deep embedded monitoring [IBM Websphere, Adaptations…] EuroSys 2006

  18. Conclusions • Declarative development of systems • Integrated approach to building and monitoring • Automatic execution tracing • Online, in-place monitoring • Step towards “autonomic” distributed systems • Fault-finding tasks evolve with the system • Interesting future directions • User interface • Trade-off between monitoring accuracy and overhead • Questions? [Thank You] EuroSys 2006

  19. Request to EuroSys • Please schedule my next talk on the first day • Move the submission deadline away from NSDI (last year, NSDI submission (19th Oct), EuroSys (20th)) EuroSys 2006

  20. Questions? Thank You! EuroSys 2006

More Related