1 / 32

Seamless Instrumentation for Production Observability

This paper explores the motivation, features, and architecture of DTrace for dynamic instrumentation of production systems. It delves into software observability challenges and the expectations from an instrumentation solution. The text discusses DTrace features, such as aggregations and speculative tracing, and its scalable architecture. It emphasizes the shift from process-centric to system-view instrumentation, enabling real-time code modifications in production.

Download Presentation

Seamless Instrumentation for Production Observability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cantrill, B., Shapiro, M., and Leventhal, A. 2004. Dynamic instrumentation of production systems. Proceedings of the 2004 Usenix Annual Technical Conference. Onur Derin Jan 25, 2007

  2. Outline • Motivation • Expectations from a solution • DTrace Features • DTrace Architecture • Instrumentation Providers • D Language • Aggregations • Speculative Tracing • Future work • Examples • Further readings • Discussion

  3. Motivation • Software Observability Problem • Since software is not physical, only way to observe it is again by software.if(tracing_enabled) printf(“we got here”); • This creates overhead. (load, compare, branch) • A solution to this is conditional compilation • But it creates two versions of software • One in development and test • Other in use in production systems • But then how do you identify a problem that occurs while the system is in use? • Problem of reproducing the problem in development • Usually, you end up finding a solution to a different problem An instrumentation solution should allow observability in production systems

  4. Motivation • Software abstraction is a good thing. (web application, web server, DB server, OS) • It implies, at higher levels, less code induces more work. • Less of a misstep induces more unintended consequences. • Missteps accumulate as you go down to software layers. (Avalanche effect) • Therefore problems are observed first in lowestlayers.e.g. excessive memory demand, excessive I/O activity, excessive network traffic. • Seeing lowest layer problems, a typical solution is to use faster hardware.e.g. more RAM, more CPU, more bandwidth • However, real problem is on higher layers of software. • Real solution is fixing the problem at higher levels. Therefore identifying the real problem requires a system-view in instrumentation rather than a process-centric one.

  5. Expectations from a Solution • Shift from development to production • Zero disabled-probe effect • Ship the product totally optimized • When it is to be observed, dynamically modify the code • Shift from programs to systems • Entire stack should be able to be dynamically instrumented.e.g. operating system, system libraries, high-level languages and environments. • Kernel is involved. So observability infrastructure should be absolutely safe. • Abruptions during production are costly. • Problematic state of the system is lost in case of a restart. • First time software is to be observed, it is already running in production • Solution shouldn’t require special compilation options, having source code, restarting components. These expectations formed the design guidelines for DTrace.

  6. DTrace Features • Dynamic Inst.: achieves zero disabled-probe effect. • Unified Inst.: instruments both user and kernel level software. • Arbitrary-context Kernel Inst.: instruments all kernel incl. scheduler and synch. • Data Integrity: reports errors in handling of data during instrumentation. • Arbitrary Actions: lets user specify arbitrary actions safely at any inst. point. • Predicates: actions when predicate true. Allows pruning of data at source. • A High-level Control Language: lets specifying predicates and actions. • A Scalable Mechanism for Aggregating Data: processes data at low levels. • Speculative Tracing: leaves decision to commit or not at a later time. • Heterogeneous Inst.: a glue framework for diff. providers from I/O to scheduler to net. • Scalable Architecture: efficient classification and selection of thousands of inst. points. • Virtualized Consumers: allows multiple, concurrent consumers of the framework. How have these features been enabled by DTrace?

  7. DTrace Architecture DTrace program source files a.d b.d Virtualized consumers intrstat(1M) intrstat(1M) DTrace consumers dtrace(1M) intrstat(1M) API libdtrace(3LIB) userland dtrace(7D) kernel DTrace Heterogeneous Inst. API sysinfo vminfo fasttrap DTrace poviders syscall profile fbt Scalable architecture

  8. Internals • Providers are loadable kernel modules that carry out the instrumentation task. • Providers communicate with DTrace Framework using a well-defined API. DTraceFramework::determineInstrumentationPoints(){ for provider in all providers { provider.determineInstrumentationPoints(createProbe); }} Provider::determineInstrumentationPoints(){ Generate list of all inst. points for instPoint in all instrumentation points { probeID = DTraceFramework.createProbe(instPoint.moduleName, instPoint.funcName, instPoint.semanticName); Associate probeID with instrumentation point }}

  9. Internals • dtrace(3LIB) advertises these probes to consumers. dtrace(3LIB)::enableProbe(providerName, moduleName, funcName, name){ probe = DTraceFramework.getProbe(providerName, moduleName, funcName, name); if(!probe.isEnabled() ) { provider.enableProbe(probe.ID); } Create Enabling Control Block(i.e. ECB) Create per-CPU buffer associated with ECB Associate ECB and probe } • ECB enables virtualized consumers. • A probe is associated withan ECB per enabling consumer. • This association is kept in DTraceFramework. Provider::enableProbe(probeID) { Dynamically modify inst. point s.t. when hit, it calls DTraceFramework::probeFired(probeID) }

  10. Internals DTraceFramework::probeFired(probeID){ Disable interrupts for ecb in all ECBs where ECB.probeID = probeID { if(ecb.predicate) DTraceFramework.execute(ecb.actions); } Re-enable interrupts } ECB + predicate+ actions • ECB Actions: • may store data in per-CPU buffer associated with ECB. • may update D variable state. • may not store to kernel memory, modify registers, change system state.

  11. Internals DTraceFramework::storeDataInPerCPUBuffer(ecb, data) { buffer = DTraceFramework.getBuffer(ecb); if(buffer.freeSpace() >= ecb.DATA_SIZE) buffer.store(data); else ecb.dropCount++; } To minimize dropCount, buffers should be read periodically. How to read buffers such that data integrity and waiting-free probe processing is assured?

  12. Buffers Consumer program Initiating read buffer operation xcall() Inactive Active Buffer1 Buffer2 Inactive Active xcall() returns CPU0 CPU1 Interrupts disabled Interrupts re-enabled Since buffer switching and probe processing can not be interrupted, data integrity is assured. What if interrupts were not disabled?

  13. Buffers Consumer program Initiating read buffer operation xcall() Inactive Buffer1 Buffer2 Inactive Active CPU0 CPU1 Probe interrupts, ECB action wants to store to the buffer. Two inactive buffers, none writtable.

  14. DIF • D Intermediate Format • Instruction set for specifying predicates and actions • But mainly in order to to allow programmable actions to be executed safely in arbitrary contexts. • DIF code is checked for validity when it is loaded. • Only forward branches are allowed to avoid infinite loops. • Illegal loads (from misaligned addresses, memory-mapped I/O devices, unmapped memory) and division by zero are handled at run-time by returning errors to the consumers. • Arbitrary stores are not allowed. • Only defined subroutines can be called at run-time.

  15. Instrumentation Providers • General properties • No disabled-probe effect • Mostly use dynamic code modification • Some examples • syscall: traces entire comm. from userland to kernel • fbt:entry and return points of kernel functions • sched: which threads run on which CPU, how long • io: disk I/O requests • mib: counters for IP, IPv6 etc. • profile: time-triggered probing at specified intervals • lockstat: kernel synchronization behaviour

  16. Function Boundary Tracing implementation in SPARC Modified dynamically ba y call x Production Software y: prepare probeID etc. call DTrace, probeFired(probeID, …) On return, call x is executed in y Instrumented Software

  17. D Language • C-like, supports ANSI C operators • Strings exist • No if, no loop. • Only integer arithmethic • No need to declare variables • Scalar variables • Associative arrays • Collection of data elements • No predefined number • Like hashes • name[key] = expression

  18. D Language • Thread-local variables: • Variables for OS threads • referred with self->variable-name • Clause-local variables: • Their storage is reused for each program clause. • Referred with this->variable-name • Built-in variables (execname, pid, timestamp, curthread) • External variables • Used in kernel modules (kmem_flags)

  19. D Language • General template • probe descriptions • /predicate/ • { • action statements • } • Probe description: • Provider Name:Module Name:FunctionName:Semantic name • Predicate is a D expression. • Actions: • Recording actions (print(), printa(), trace()) • Destructive actions(disabled by default) • Special actions(copyinstr(), strlen(), rand() etc.)

  20. Aggregations (Cherry on the cake) • Aggregate data and look for trends, generate reports • General form • @name[keys] = aggfunc(args); • Aggregation function: • f(f(x0) U f(x1) U ... U f(xn)) = f(x0 U x1 U ... U xn) • e.g. • Count() • Min() • Max() • Sum() • Avg() • Quantize()

  21. Speculative Tracing • Trace data and later commit or not to a buffer • When you cannot use a predicate condition and don't know a probe event • When you have an error event and would like to know the history behind it and why that error occurred • Functions: • speculation() • speculate() • commit() • discard()

  22. Example D Programs • BEGIN • { • trace(“Hello world”); • exit(); • } • # dtrace -s helloworld.d • dtrace: script 'helloworld.d' matched 1 probe • CPU ID FUNCTION:NAME • 0 1 :BEGIN Hello world • syscall::read:entry • { • printf("Process %d", pid); • }

  23. Example D Programs • syscall::read:entry • { • printf("Process %d", pid); • } • # dtrace -s d2.d • dtrace: script 'd2.d' matched 1 probe • CPU ID FUNCTION:NAME • 0 44129 read:entry Process 2680 • 0 44129 read:entry Process 2680 • 0 44129 read:entry Process 2827 • 0 44129 read:entry Process 2680 • 0 44129 read:entry Process 2680 • 0 44129 read:entry Process 2827 • …

  24. Example D Programs syscall::write:entry /execname=="sshd"/ { @[arg0] = quantize(arg2); } RESULT: 4 value ------------- Distribution ------------- count 8 | 0 16 |@ 1 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24 64 |@@@@@@ 5 128 |@ 1 256 |@@@@ 3 512 | 0

  25. Example D Programs syscall::write:entry /execname==“sshd” && arg0==5/ { @[ustack()] = quantize(arg2); } RESULT: next slide

  26. # dtrace -s d4.d dtrace: script 'd4.d' matched 1 probe ^C libc.so.1`_write+0x15 sshd`altprivsep_start_monitor+0x220 sshd`main+0xe57 sshd`0x805bad2 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 8 | 0 # dtrace -s d4.d dtrace: script 'd4.d' matched 1 probe ^C libc.so.1`_write+0x15 sshd`altprivsep_start_monitor+0x220 sshd`main+0xe57 sshd`0x805bad2 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 8 | 0

  27. libc.so.1`_write+0x15 pkcs11_softtoken.so.1`looping_write+0x32 pkcs11_softtoken.so.1`C_SeedRandom+0xfd libpkcs11.so.1`C_SeedRandom+0xed mech_krb5.so.1`krb5_c_random_seed+0x3d mech_krb5.so.1`init_common+0x121 mech_krb5.so.1`krb5_init_context+0xd mech_krb5.so.1`krb5_gss_get_context+0x3d mech_krb5.so.1`_C0095D0A+0x49 libgss.so.1`__gss_get_mechanism+0xad libgss.so.1`gss_add_cred+0x79 libgss.so.1`gss_acquire_cred+0xfb sshd`ssh_gssapi_server_mechs+0x7c sshd`ssh_gssapi_server_kex_hook+0x22 sshd`0x807cc12 sshd`kex_send_kexinit+0x2a sshd`kex_setup+0x74 sshd`0x805e90f sshd`main+0xe05 sshd`0x805bad2 value ------------- Distribution ------------- count 4 | 0 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 16 | 0

  28. Example D Programs syscall::open:entry {@files[copyinstr(arg0)] = count(); } RESULT: # dtrace -s d5.d dtrace: script 'd5.d' matched 1 probe ^C /etc/resolv.conf1

  29. RESULT when copyinstr is removed:# dtrace -s d5.d dtrace: script 'd5.d' matched 1 probe dtrace: error on enabled probe ID 1 (ID 44133: syscall::open:entry): invalid address (0x80fbdaf) in action #2 at DIF offset 28 dtrace: error on enabled probe ID 1 (ID 44133: syscall::open:entry): invalid address (0x80fbdaf) in action #2 at DIF offset 28 ^C /lib/libc.so.1 1 /proc/2647/psinfo 1 /proc/2723/psinfo 1 /proc/2874/psinfo 1 /proc/4680/psinfo 1 /proc/4691/psinfo 1 /proc/4740/psinfo 1 /var/ld/ld.config 1 /dev/null 2 /etc/resolv.conf 2 /var/adm/utmpx 2

  30. Future Work • Performance counter provider • Helper actions: Embracing high-level languages and their environments. • User lock analysis: lock contention analysis of user-level multi-threaded processes. • Fine-grained user-level providers • Software visualization

  31. Further Readings to be Googled • DTrace Guide • Hidden In Plain Sight, Cantrill B. • DTrace Toolkit as a repository of D scripts classified in terms of application domains like CPU, Disk, Mem, Kernel, Net etc. • DTrace & DTraceToolkit, Stefan Parvu

  32. Discussion • No discussion of how much overhead is introduced when probes are enabled. • Safety is considered only as not crashing and halting the system. What about guarenteeing not violating other requirements of the system like real-time properties?

More Related