230 likes | 352 Views
A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs. Sebastian Burckhardt Microsoft Research. Madanlal Musuvathi Microsoft Research. Pravesh Kothari Indian Institute of Technology, Kanpur. Santosh Nagarakatte University of Pennsylvania. What is Concurrency Testing?.
E N D
A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs Sebastian Burckhardt Microsoft Research Madanlal Musuvathi Microsoft Research Pravesh Kothari Indian Institute of Technology, Kanpur Santosh Nagarakatte University of Pennsylvania
What is Concurrency Testing? • Whether a test finds a bug depends on • the configuration • the inputs • the schedule • Concurrency bugs are bugs that surface only for some schedules • The Concurrency Testing Problem • How to cover buggy schedules as best we can? • Testing all schedules is infeasible!
Idea: Randomize the Schedule Child Parent void* p = 0; RandDelay(); CreateThd(child); RandDelay(); p = malloc(…); void* p = 0; RandDelay(); CreateThd(child); RandDelay(); p = malloc(…); void* p = 0; RandDelay(); Start(child); void* p = 0; CreateThd(child); p = malloc(…); Instrument code with calls to insert random delays If we are lucky, delay exposes bugs But: how long to delay? where not to delay? Init(); RandDelay(); DoMoreWork(); RandDelay(); p->f ++; Init(); DoMoreWork(); p->f ++; Init(); RandDelay(); DoMoreWork(); RandDelay(); p->f ++; Init(); RandDelay(); DoMoreWork(); RandDelay(); p = malloc(…); RandDelay(); p->f ++;
What is a Randomized Algorithm? • A randomized algorithm: • “An algorithm that makes nondeterministic choices” • An algorithm using a random source with a precisely defined distribution • A probabilistic guarantee: • “A guarantee that doesn’t always hold” • A lower bound on the probability of success
What we did / Talk Outline • Define bug depth in such a way that common bugs have low depth • Develop PCT algorithm (probabilistic concurrency testing), a randomized scheduling algorithmwith a good probabilistic guarantee to find bugs of low depth • Build it into Cuzz, a concurrency fuzzing tool that improves the efficiency of stress testing
Part I Bug depth
Bug Depth Bug Depth = the number of ordering constraints a schedule has to satisfy to find the bug. More constraints means more things have to go “just right” to find the bug. Conjecture: many typical bugs have low depth.Let’s look at 3 examples.
Ordering Violation Example: A Bug of Depth 1 Parent Thread Child Thread … start(child); p = malloc(); … … do_init(); p->f ++; … Bug depth = the number of ordering constraintssufficient to find the bug. All schedules that satisfy the “” find the bug.
Atomicity Violation Example: A Bug of Depth 2 Parent Thread Child Thread p = malloc(); start(child); … If (p != null) p->f++ … … p = null; … Bug depth = the number of ordering constraints sufficient to find the bug. All schedules that satisfy both “” find the bug.
Deadlock Example: A Bug of Depth 2 Parent Thread Child Thread … Lock(A); … Lock(B); … … Lock(B); … Lock(A); … Bug depth = the number of ordering constraints sufficient to find the bug. All schedules that satisfy both “” find the bug.
Part II the PCT ALGORITHM
PCT Algorithm: Randomly Assign & Change Thread Priorities Input: int k; // no. of steps - guessed from previous runs int d; // target bug depth - randomly chosen State: intpri[]; // thread priorities int change[]; // when to change priorities intstepCnt; // current step count PCT::Init() { stepCnt = 0; foreachtid pri[tid] = rand() + d; for( i=0; i<d-1; i++ ) change[i] = rand() % k; } PCT::RandDelay( tid ) { stepCnt ++; if stepCnt == change[i] for some i pri[tid] = i; if (tid is not highest pri enabled thread) spin; }
The PCT Guarantee • Given a program with • n threads (~tens) • k steps (~millions) • a bug of depth d (1,2) • Each run PCT finds the bug with a probability of at least (this is a worst-case guarantee)
Part III the cuzzTool& Results
How it Works • Intercept at synchronization points • Detour win32 synchronization calls • Optionally instrument data accesses • No manual instrumentation required Program binary instrumentation for data accesses (optional) Cuzz Randomized Algorithm Win32 API Kernel Scheduler
Practice Beats Worst-Case • Measured Probability often significantly better than worst-case guaranteed probability
Why Does Practice Beat Worst-Case? • Worst-case guarantee applies to hardest-to-find bug of given depth • If bugs can be found in multiple ways, probabilities add up! • Example: Increasing the number of threads helps:
Internal Tool Status • TheCuzz tool is available internally at Microsoft • We are working with several product groups that actively use Cuzzto improve their stress testing
Demo Conclusion • Measure probabilities on cluster • Without Cuzz: 1 Fail in 238’820 runs ratio = 0.000004817 • With Cuzz: 12 Fails in 320 runs ratio = 0.0375 • Resource Savings: factor 7,800 1 day of stress testing = 11 seconds of Cuzz testing
Conclusions • Bug depth is a useful metric to focus testing efforts • Systematic randomization improves concurrency testing • No reason not to use Cuzz Thank You For Your Attention.