Presented by: Choi Yong suk

Debugging in the (Very) Large: Ten Years of Implementation and ExperienceKirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt In SOSP ’09: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 103–116. ACM, 2009. Presented by: Choi Yong suk

WER Paper … Bucketing details(Sec. 3) Statistics-based debugging(Sec. 4) Progressive data collection(Secs. 2.2 & 5.4) Service implementation(Sec. 5) WER experiences(Sec. 6) OS Changes (Sec. 7) Related work(Sec. 8)

A Revelation

Two Definitions Bug: a flawin program logic • #define MYVAR *(int*)random() ... MYVAR = 5; Error: a failure in execution caused by a bug • Run it 5,000 times, you’ll get ~ 5000 errors. One bug may cause many errors.

The Challenge Microsoft ships software to 1 billion users, • How do we find out when things go wrong? We want to • fix bugs regardless of source • application or OS • software, hardware, or malware • prioritize bugs that affect the most users • generalize the solution to be used by any programmer

The Human Bottleneck Can’t hire enough technicians Data is inaccurate Hard to get additional data No “global” baseline Useless for heisenbugs Need to remove humans

Goal: Fix the Data Collection Problem Analyze !!! • Engine for WER bucketing heuristics • Extension to the Debugging Tools for Windows • input is a minidump, output is bucket ID • runs on WER servers (and programmers desktops) • http://www.microsoft.com/whdc/devtools • 500 heuristics • grows ~ 1 heuristic/week Allow one service to record • every error (application, OS, and hardware) • on every Windows system • worldwide. Corollary: • That which we can measure, we can fix…

WER by the Numbers

To Recap and Elaborate… (Sec. 3) What I told you: • client automatically collects a minidump • sends minidump to servers • !analyze buckets the error with similar reports • increments the bucket count • programmers prioritize buckets with highest count Actually… • only upload first few hits on a bucket, others just inc. • programmers request additional data as needed

2-Phase Bucketing Strategy (Sec. 3) Labeling(on client): bucketed by failure point • outlook.exe, plugin.dll, v1.0.2305, 0x23f5 {program name}, {binary}, {version}, {pc offset} Classifying(on servers): re-bucketed toward root cause by !analyze • consolidate version and replace offset with symbol • outlook.exe, plugin.dll, memcpy • find caller of memcpy(because it isn’t buggy) • outlook.exe, plugin.dll, foo • etc. Paper contains much more detail on bucketing…

Bucketing Mostly Works (Sec. 3) Expending heuristics Condensing heuristics One bug can hit multiple buckets • up to 40% of error reports • memcpy(&MYVAR, j, 4); • one bucket when &MYVAR isn’t mapped • many others when &MYVAR is in a data section • extra server load • duplicate buckets must be hand triaged Multiple bugs can hit one bucket • up to 4% of error reports • harder to isolate each bug Solution: scale is our friend

Top Heuristics for Labeling (Sec. 3) Unhandled Program exception Labeled by the assert tag run on client • Part of an algorithm pn fn .. .. Stack dump pn f2 pn f1 f0 pn 0~5 priority • For filter out run on server Faulting program, module Offset of PC (program counter)

Debugging in the Small… (Sec. 4)

Statistics-based debugging (Sec. 4) Support technician tries to diagnose error User calls technical support Technicians reports “top ten” issues to programmers

Statistics-based debugging (Sec. 4) mini Stack dump Analyze Func 0 bug Func 1 Func 2 Func 3 effectiveness of debugging by the WER • Prioritizing debugging effort. • Finding hidden causes. • Test of programmer hypotheses (about the root causes of errors). • Measuring deployment of solutions, and watching for regressions. The WER database…

Service implementation(Sec. 5) Bucket lable (Unencrypted) Minidump Data desired by the WER service.(Encrypted) Push the data in CAB file.(Encrypted) Finish and request error solutions (Encrypted) analyze Four stage protocol 111,202 120,221 8,313,234 10,323,123 54,983,324 Add 2th Dump of Vista (GPU Data): 32KB -> 128KB -> 1MB[SP2] Add 2th Dump of Windows7 (GPU Data): 1MB

Service implementation(Sec. 5) 20ea servers 100 million error reports per day. Additional data request by programmmer. Bucket parameters Bucket counts data collection 120TB storage (6 months) data mining !analyze WER Servers • About 60 servers. • Error report DB of 65TB • CAB files storage of 120TB • Stage 1 : HTTP/GET • Other Stage ASP.NET

WER experiences(Sec. 6) • Growth of Report Load over 6 Years. • Renos Malware: Number of error reports per day. • Daily Report Load as % of Average for Feb. 2008.

WER experiences(Sec. 6) • Percentage of Buckets per Quartile of Reports. • Relative Number of Reports per Bucket and CDF for Top 20 Buckets from Office 2010 ITP.

WER experiences(Sec. 6) • CDFs of Error Reports for the Top 500 Buckets for Windows Vista and Vista SP1. • Crashes by Driver Class Normalized to Hardware Failures for Same Period.

WER experiences(Sec. 6) • Percentage of all Reports for Office 2010 ITP in Second or One-Hit Buckets. • Ranking of Labeling Heuristics by % of Buckets. • Crashes/Day for a Firmware Bug. Patch was released via WU on day 10. • Percentage of Kernel Crashes later Re-bucketed.

WER experiences(Sec. 6) • Third-Party use of WER. • Drivers and Hardware Encountered by WER.

Conclusion Windows Error Reporting (WER) • the firstpost-mortem reporting system with automatic diagnosis • the largest client-server system in the world (by installs) • helped 700 companies fix 1000sof bugs and billions of errors • fundamentally changed SW development at MS • “WER forced us to stop making [things] up !!!!” WER works because bucketing mostly works.

Questions?

Presented by: Choi Yong suk