1 / 30

Extensible Distributed Tracing from Kernels to Clusters

Fay. Extensible Distributed Tracing from Kernels to Clusters. Úlfar Erlingsson, Google Inc. Marcus Peinado , Microsoft Research Simon Peter, Systems Group, ETH Zurich Mihai Budiu , Microsoft Research. Wouldn’t it be nice if…. We could know what our clusters were doing?

ryder
Download Presentation

Extensible Distributed Tracing from Kernels to Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fay Extensible Distributed Tracing from Kernels to Clusters Úlfar Erlingsson, Google Inc. Marcus Peinado, Microsoft Research Simon Peter, Systems Group, ETH Zurich MihaiBudiu, Microsoft Research

  2. Wouldn’t it be nice if… • We could know what our clusters were doing? • We could ask any question, … easily, using one simple-to-use system. • We could collect answers extremely efficiently … so cheaply we may even ask continuously.

  3. Let’s imagine... • Applying data-mining to cluster tracing • Bag of words technique • Compare documents w/o structural knowledge • N-dimensional feature vectors • K-means clustering • Can apply to clusters, too!

  4. Cluster-mining with Fay • Automatically categorize cluster behavior, based on system call activity

  5. Cluster-mining with Fay • Automatically categorize cluster behavior, based on system call activity • Without measurable overhead on the execution • Without any special Fay data-mining support

  6. Fay K-Means Behavior-Analysis Code var kernelFunctionFrequencyVectors = cluster.Function(kernel, “syscalls!*”) .Where(evt => evt.time < Now.AddMinutes(3)) .Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr }) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() }); Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; } Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count()); } Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs; }

  7. Fay K-Means Behavior-Analysis Code var kernelFunctionFrequencyVectors = cluster.Function(kernel, “syscalls!*”) .Where(evt => evt.time < Now.AddMinutes(3)) .Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr }) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

  8. Fay vs. Specialized Tracing • Could’ve built a specialized tool for this • Automatic categorization of behavior (Fmeter) • Fay is general, but can efficiently do • Tracing across abstractions, systems (Magpie) • Predicated and windowed tracing (Streams) • Probabilistic tracing (Chopstix) • Flight recorders, performance counters, …

  9. Key Takeaways Fay: Flexible monitoring of distributed executions • Can be applied to existing, live Windows servers • Single query specifies both tracing & analysis • Easy to write & enables automatic optimizations • Pervasively data-parallel,scalable processing • Same model within machines & across clusters • Inline, safe machine-code at tracepoints • Allows us to do computation right at data source

  10. K-Means: Single, Unified Fay Query var kernelFunctionFrequencyVectors = cluster.Function(kernel, “*”) .Where(evt => evt.time < Now.AddMinutes(3)) .Select(evt => new { Machine = fay.MachineID(), Interval = evt.Cycles / CPS, Function = evt.CallerAddr}) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() }); var kernelFunctionFrequencyVectors = cluster.Function(kernel, “*”) .Where(evt => evt.time < Now.AddMinutes(3)) .Select(evt => new { Machine = MachineID(), Interval = w.Cycles / CPS, Function = w.CallerAddr}) .GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() }); Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (|pt – c| < |pt – near|) near = c; return near; } Vector Nearest(Vector pt, Vectors centers) { var near = centers.First(); foreach (var c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; } Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count()); } Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs; } Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count()); } Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs; }

  11. Fay is Data-Parallel on Cluster • View trace query as distributed computation • Use cluster for analysis

  12. Fay is Data-Parallel on Cluster • System call trace events • Fay does early aggregation & data reduction • Fay knows what’s needed for later analysis

  13. Fay is Data-Parallel on Cluster • System call trace events • Fay does early aggregation & data reduction • K-Means analysis • Fay builds an efficient processing plan from query

  14. Fay is Data-Parallel within Machines • Early aggregation • Inline, in OS kernel • Reduce dataflow & kernel/user transitions • Data-parallel per each core/thread

  15. Processing w/o Fay Optimizations K-Means: System calls K-Means: Clustering • Collect data first (on disk) • Reduce later • Inefficient, can suffer data overload

  16. Traditional Trace Processing K-Means: System calls K-Means: Clustering • First log all data (a deluge) • Process later (centrally) • Compose tools via scripting

  17. Takeaways so far Fay: Flexible monitoring of distributed executions • Single query specifies both tracing & analysis • Pervasively data-parallel,scalable processing

  18. Safety of Fay Tracing Probes • A variant of XFI used for safety [OSDI’06] • Works well in the kernel or any address space • Can safely use existing stacks, etc. • Instead of language interpreter (DTrace) • Arbitrary, efficient, statefulcomputation • Probes can access thread-local/global state • Probes can try to read any address • I/O registers are protected

  19. Key Takeaways, Again Fay: Flexible monitoring of distributed executions • Single query specifies both tracing & analysis • Pervasively data-parallel,scalable processing • Inline, safe machine-code at tracepoints

  20. Installing and Executing Fay Tracing • Fay runtime on each machine • Fay module in each traced address space • Tracepoints at hotpatched function boundary Tracing Runtime query Createprobe ETW User-Space Kernel Target Fay Probe XFI Hotpatching 200 cycles

  21. Low-level Code Instrumentation Module with a traced function Foo Caller: ... e8ab62ffff call Foo ... ff1508e70600 call[Dispatcher] Foo: ebf8 jmp Foo-6 cccccc Foo2: 57 push rdi ... c3 ret • Replace 1stopcode of functions

  22. Low-level Code Instrumentation Module with a traced function Foo Fay platform module Dispatcher: t = lookup(return_addr) ... call t.entry_probes ... call t.Foo2_trampoline ... call t.return_probes ... return /* to after call Foo */ Caller: ... e8ab62ffff call Foo ... ff1508e70600 call[Dispatcher] Foo: ebf8 jmp Foo-6 cccccc Foo2: 57 push rdi ... c3 ret • Replace 1stopcode of functions • Fay dispatcher called via trampoline

  23. Low-level Code Instrumentation Module with a traced function Foo Fay platform module Fay probes Dispatcher: t = lookup(return_addr) ... call t.entry_probes ... call t.Foo2_trampoline ... call t.return_probes ... return /* to after call Foo */ Caller: ... e8ab62ffff call Foo ... ff1508e70600 call[Dispatcher] Foo: ebf8 jmp Foo-6 cccccc Foo2: 57 push rdi ... c3 ret PF3 XFI PF4 PF5 XFI XFI • Replace 1stopcode of functions • Fay dispatcher called via trampoline • Fay calls the function, and entry & exit probes

  24. What’s Fay’s Performance & Scalability? • Fay adds 220 to 430 cycles per traced function • Fay adds 180% CPU to trace all kernel functions • Both approx 10x faster than Dtrace, SystemTap Slowdown (x) Null-probe overhead Cycles

  25. Fay Scalability on a Cluster • Fay tracing memory allocations, in a loop: • Ran workload on a 128-node, 1024-core cluster • Spread work over 128 to 1,280,000 threads • 100% CPU utilization • Fay overhead was 1% to 11% (mean 7.8%)

  26. More Fay Implementation Details • Details of query-plan optimizations • Case studies of different tracing strategies • Examples of using Fay for performance analysis • Fay is based on LINQ and Windows specifics • Could build on Linux using Ftrace, Hadoop, etc. • Some restrictions apply currently • E.g., skew towards batch processing due to Dryad

  27. Conclusion • Fay: Flexible tracing of distributed executions • Both expressiveand efficient • Unified trace queries • Pervasive data-parallelism • Safe machine-code probe processing • Often equally efficient as purpose-built tools

  28. Backup

  29. A Fay Trace Query from ioin cluster.Function("iolib!Read") where io.time < Now.AddMinutes(5) let size = io.Arg(2) // request size in bytes group ioby size/1024 into g select new { sizeInKilobytes = g.Key, countOfReadIOs = g.Count() }; • Aggregates read activity in iolib module • Across cluster, both user-mode & kernel • Over 5 minutes

  30. A Fay Trace Query from ioin cluster.Function("iolib!Read") where io.time < Now.AddMinutes(5) let size = io.Arg(2) // request size in bytes group ioby size/1024 into g select new { sizeInKilobytes = g.Key, countOfReadIOs = g.Count() }; • Specifies what to trace • 2nd argument of read function in iolib • And how to aggregate • Group into kb-size buckets and count

More Related