Concurrency Analysis Platform And Tools For Finding C o ncurrency Bugs

TL58 Concurrency Analysis Platform And Tools For Finding Concurrency Bugs  Thomas Ball Principal Researcher Microsoft Corporation  Madan Musuvathi Researcher Microsoft Corporation  Shaz Qadeer Senior Researcher Microsoft Corporation  Sebastian Burckhardt Researcher Microsoft Corporation

Concurrency Is HARD ! • Rare thread interleavings can result in bugs • These bugs are hard to find, reproduce, and debug • Heisenbugs: Observing the bug can “fix” it ! • A huge productivity problem • Developers and testers can spend weeks chasing a single Heisenbug

Main Takeaways • You can find and reproduce Heisenbugs • new automatic tool called CHESS • for Win32 and .NET • CHESS used extensively inside Microsoft • Parallel Computing Platform (PCP) • Singularity • Dryad/Cosmos • Releasing via DevLabs

demo Why Is Concurrency Hard?  Madan Musuvathi Researcher Microsoft Corporation

Concurrency Analysis Platform (CAP) • Goal: Drive a program along an interleaving of choice • Interleaving decided by user or by a program/tool • Today: Controlling/observing concurrency is difficult • Manual and intrusive process • Enables lots of concurrency tools: • Test a program along a set of interleavings • Reproduce Heisenbugs • Program understanding / debugging • ...

demo Taming Concurrency  Madan Musuvathi Researcher Microsoft Corporation

CAP Architecture Coverage Unmanaged Program Repro Monitors Memory Model bugs Visualization Debugging Testing Data races Windows CAP Managed Program • Record the interleaving executed • Drive the program along an interleaving .NET CLR

CAP Specifics • Ability to explore all interleavings • Need to understand complex concurrency APIs (Win32 and System.Threading) • Threads, threadpools, locks, semaphores, async I/O, APCs, timers, … • Does not introduce false behaviors • Any interleaving produced by CAP is possible on the real scheduler

Overview • Concurrency Analysis Platform (CAP) • CHESS : find/reproduce Heisenbugs • Integration with Visual Studio • Demo on CCR Heisenbug • Future CAP tools • FeatherLite: Data-race detection • Sober: Memory-model bugs

CHESS: Find And Reproduce Heisenbugs Program While(not done) { TestScenario() } CHESS runs the scenario in a loop CHESS TestScenario() { … } • Every run takes a different interleaving • Every run is repeatable • Uses the CAP scheduler • To control and direct interleavings CAP Win32/.NET • Detect • Assertion violations • Deadlocks • Dataraces • Livelocks Kernel: Threads, Scheduler, Synchronization Objects

Number of executions: nnk Exponential in both n and k For n=2, k = 100 > # of atoms in the universe Limits scalability to large programs CHESS challenge Programs have LOTS of interleavings Thread 1 Thread n x = 1; … … … … … x = k; x = 1; … … … … … x= k; … k steps each n threads Goal: Scale CHESS to large programs (large k)

Preemption Bounding • Focus on executions with small number of preemptions • Unexpected preemptions cause bugs Thread 1 Thread 2 x = 1; if (p != 0) { x = p->f; } x = 1; if (p != 0) { p = 0; preemption x = p->f; } non-preemption

Number of interleavings: nnk Interleavings with c preemptions (n2k)c. nn For n=2, k=100, c=2 < 1 million interleavings Analysis techniques reduce this further Managing Astronomical Number Of Interleavings Thread 1 Thread n x = 1; … … … … … x = k; x = 1; … … … … … x= k; … k steps each n threads

demo CHESS In VSTS  Thomas Ball Principal Researcher Microsoft Corporation

demo Real Scenario: Heisenbug In CCR

George Chrysanthakopoulus’ Challenge

CHESS Internal Customers • Parallel Computing Platform • PLINQ: Parallel LINQ • CDS: Concurrent Data Structures • STM: Software Transactional Memory • TPL: Task Parallel Library • ConcRT: Concurrency RunTime • CCR: Concurrency Coordination Runtime • Dryad/Cosmos • Singularity (Research OS from MSR) • CHESS can systematically test the boot and shutdown process

announcing CHESS athttp://msdn.microsoft.com/devlabs/

Overview • Concurrency Analysis Platform (CAP) • CHESS: Find/reproduce Heisenbugs • Integration with Visual Studio • Demo on CCR Heisenbug • Future CAP tools • FeatherLite: Data-race detection • Sober: Memory-model bugs

FeatherLiteLightweight data-race detection • Data-races: Access to data with insufficient synchronization • Data-races are a common source of concurrency errors Thread 1 Thread 2 EnterCS(cs) if (p != 0) { x = p->f; } LeaveCS(cs) p = 0;

Sampling To Reduce Overhead • Existing data-race detection tools have a large runtime overhead • More than 10X slowdown • Process every memory access • Intelligent sampling algorithms • Process < 5% of the memory accesses • Less than 30% runtime overhead • Existing tools > 1000% overhead

FeatherLite + CAP"Active" data-race detection • Programs have “benign” data-races • Do not result in program crashes • Example: Updating statistical counters without holding a lock • FeatherLite drives the program along the two outcomes of the data-race

SoberTool for finding memory-model errors • Expert programmers use “lock-free” techniques • Use low-level synchronizations, volatile variables, … • For performance • Such programs are exposed to memory-model issues • Compiler can reorder instructions • Hardware can reorder/delay memory accesses • Result: Hard-to-find bugs that are hard-to-understand

Sober Quiz • Can both threads DoWork() at the same time? // initial state volatile bool flag1 = false; volatile bool flag2 = false; Thread 1 Thread 2 flag1 = true; if( !flag2 ) DoWork(); flag2 = true; if( !flag1 ) DoWork();

Conclusion • Concurrency Analysis Platform (CAP) for controlling thread interleavings • Enables lots of concurrency tools • CHESS – Find and reproduce Heisenbugs • Don’t stress, use CHESS • Look for download on http://msdn.microsoft.com/devlabs/ • Future CAP tools • FeatherLite: Lightweight data-race detection • Sober: Tool for finding memory-model bugs

Evals & Recordings Please fill out your evaluation for this session at: This session will be available as a recording at: www.microsoftpdc.com

Q&A Please use the microphones provided

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Concurrency Analysis Platform And Tools For Finding C o ncurrency Bugs

Concurrency Analysis Platform And Tools For Finding C o ncurrency Bugs

Presentation Transcript

Concurrency in C 

Bugs and More Bugs

Bugs-O-Copter

Concurrency Testing Challenges, Algorithms, and Tools

Modern Concurrency Abstractions for C#

Finding and Fixing Bugs in Software

Finding bugs with system-specific static analysis

Bugs – From Finding to Preventing

Finding Concurrency

Finding and fixing bugs

Computer as tools for finding

ROOT, I/O and Concurrency

Dual Analysis for Proving Safety and Finding Bugs

Modern Concurrency Abstractions for C#

Finding bugs with system-specific static analysis

Bugs, security vulnerabilities, and automated tools

Teranode Tools and Platform for Pathway Analysis

Mobile Tools for Java Platform

C-Store: Concurrency Control and Recovery

Concurrency Abstractions in C#

Finding Bugs with PC-lint A Static Analysis Tool for C/C++

Mobile Tools for Java Platform