320 likes | 507 Views
Magpie: Distributed request tracking for realistic performance modelling . Rebecca Isaacs Paul Barham Richard Mortier Dushyanth Narayanan Microsoft Research Cambridge James Bulpin University of Cambridge. Performance in distributed systems.
E N D
Magpie: Distributed request tracking for realistic performance modelling Rebecca Isaacs Paul Barham Richard Mortier Dushyanth Narayanan Microsoft Research Cambridge James Bulpin University of Cambridge
Performance in distributed systems • Faults in distributed systems are notoriously hard to diagnose • Performance problems are even more subtle to debug • Often transient or affect only a subset of requests / users • Frequently involve complex interactions between multiple machines • Aggregate statistics (e.g. utilization) may look perfectly normal
Magpie Approach • Track individual requests end to end • Observe control flow (causality) • Monitor resource consumption: CPU, bandwidth, disk • Debug performance “in the small” • Build a probabilistic workload model from the aggregate requests • Cluster similar requests according to their observed behaviour • Debug performance “in the large”
How do we use this information? • Performance debugging • Why did this request take much longer than that request? • Fault detection • Configuration and management • Performance prediction • Realistic workload models for capacity planning • Obtain automatically on a “live” system
Magpie components • Instrumentation • System activity recorded to logs • Generic request parser • Extract individual requests from logs according to an event schema • Model construction • Behavioural clusters • Probabilistic state machine
Outline • Introduction • What is a request? • Instrumentation • Request extraction • Modelling • Current status
What is a request? • System activity which takes place in response to an action initiated by the application being traced • HTTP request • Database query • File open request • We describe a request as • The sequence of application components involved in its processing • The resource consumed at each stage • CPU, bandwidth, disk transfer size, (latency)
A typical e-commerce site (1) Internet Storage SQL Servers Web Front Ends
http.sys A typical e-commerce site (2) SQL Server Web Server CLR IIS Application Logic Filter Static Stored Content procedures ADO.NET ASP.NET Data WinSock2 API WinSock2 API Kernel Kernel
IIS worker thread picks up request ASP.NET thread blocks after http.sys IIS worker thread Sync WinSock send RPC to database wakes up to write log to SQL Server HTTP request HTTP response packets TDS request and reply packet ASP.NET worker sent back to client packets sent and thread takes over received SQL thread unblocks HTTP request: detailed view from ! WEB.eec - + + + + - - - - - - WEB.398 Disk Net RX Net TX 10.051s 10.155s 10.100s Net TX Net RX Disk - - - SQL.9c4 10.051s 10.100s 10.155s KEY: Blocked IIS ASP.NET SQL Disk Other
Why is request tracking hard? • Many components, multiple machines • Must track control flow across machines • No globally unique request ID • Components are developed independently • Multiple thread pools • Many threads participate in processing a request • Asynchronous communication • Must match send/recvs between threads/machines • Hand-rolled synchronization primitives • SQL server has user-mode scheduler
Outline • Introduction • What is a request? • Instrumentation • Request extraction • Modelling • Current status
Event Tracing for Windows • Low-overhead event mechanism • Events timestamped with cycle counter • Global ordering on events on a single machine • Can enable/disable sets of events at runtime • Using ETW in Magpie • Each instrumentation point posts an event • Events are logged to disk • Logs are post-processed to extract requests • Can also consume events in real time
Instrumentation points • Existing ETW event providers • IIS, kernel • App-specific hooks • IIS, ASP.NET, SQL Server • Detours • Wrap dlls to trap Win32 and WinSock2 calls • WinPcap • Capture packets on the wire
CPU usage from kernel events • The ETW kernel logger records every context switch • How do we know which cycles are used for which request? • We can attribute cycles to a request by • An application-specific event which occurs within a delimited sector of CPU time, or • The current context of execution, eg thread id
Example: protocol processing in a DPC DPC start DPC end pkt recv cswitch Events: cswitch Request 1 cycle count time Request 2 cycle count
Application and middleware events • Cover points where flow of control moves between components • Cover points where resources are multiplexed and demultiplexed • E.g. user-level scheduling primitives • Propagation of a global request id is not required! • Magpie used to do this but not any more
Wrappers http.sys Instrumenting a web service SQL Server Web Server CLR IIS Application Extended SPs Logic Filter Static Stored Content procedures HTTPModule ADO.NET ASP.NET Data CLR profiler ISAPI Filter Intercept Intercept WinSock2 API WinSock2 API Kernel Kernel Event Tracing for Windows Event Tracing for Windows Packet capture Packet capture
Outline • Introduction • What is a request? • Instrumentation • Request extraction • Modelling • Current status
Generic request extraction • No inbuilt assumptions about the system or the application • No common unique identifier • Schema specifies semantics of events • Easy to add new event types • Parser stitches events into requests based on event semantics
Terminology • Namespace • Event parameter which references an entity in the system, eg thread id • Timeline • Instantiation of a namespace with a unique value, eg thread id = 0xa • Events bind or unbind requests to timelines • Bindings capture the semantics of each event for a particular request type
Example: connecting events Recv returns Enter Recv DPC start DPC end TCP pkt cswitch cswitch Cpuid=0 Tid=0xa Tid=0xb Connid=0xd Request 1 Request 2
End-to-end request extraction • An instance of the request parser runs on each machine in the distributed system • Online or offline mode • Offline post-processing connects request fragments from each node according to a globally unique namespace, e.g. packet IP identifier
Outline • Introduction • What is a request? • Instrumentation • Request extraction • Modelling • Current status
Clustering for workload generation • Target the Indy performance modelling tool • Calculates throughput, bottlenecks • Needs transaction mix, resource consumption • Previously: microbenchmark approach • Run 10000 of each “transaction type” (URL) • Divide aggregate resource usage by 10000 • Aim: provide realistic workload models • From real, mixed workloads • Derive transaction “types” automatically
Network Disk Single request: cartoon view • Partial ordering of events • Annotated with resource usage SQL Server CPU ASP.NET CPU IIS CPU
Behavioural clustering of requests • Represent requests as event strings • “Flatten” out any concurrency • Use Levenshtein string edit distance • Modified to factor in resource usage vectors • Cluster requests based on this distance • Linear-time algorithm • Each cluster is a request “type” • Select representative from near centroid
A 7% B E 10% 63% C 15% D 5% Build a workload model by clustering similar requests Requests in the same cluster often have different URLs, and one URL may appear in many clusters A B C E D
Taking it further: work-in-progress • Online and incremental modelling: • Detect component failure • Detect sudden shifts in workload • More sophisticated models • Learn the probabilistic state machine for each request • c.f. flowcharts annotated with performance information • “Bayesian watchdogs” • Compute the likelihood of a request’s behaviour as it moves through the system • Deal with “unlikely” requests appropriately
Outline • Introduction • What is a request? • Instrumentation • Request extraction • Modelling • Current status
Current status • Recent focus has been developing a generic request extraction scheme • Prototype for 2-machine e-commerce site • TPC-W style workload • Prototype for single machine SQL Server 2000 • Challenge is user mode scheduler • TPC-C workload • Other applications on the way • Large-scale • “Real” systems with “real” performance problems
Conclusion • Magpie is a tool for performance analysis in a distributed system • Bottom up, per-request approach • Complementary to existing techniques: • Performance counters • Program profiling • Feeds into performance debugging and prediction tools