340 likes | 500 Views
Correcting Threading Errors with Intel® Parallel Inspector. Objectives. After successful completion of this module you will be able to… Use Parallel Inspector to detect and identify a variety of threading correctness issues in threaded applications
E N D
Objectives • After successful completion of this module you will be able to… • Use Parallel Inspector to detect and identify a variety of threading correctness issues in threaded applications • Determine if library functions are thread-safe Intel® Parallel Inspector
Agenda • What is Intel® Parallel Inspector? • Detecting race conditions • Detecting potential for deadlock • Checking library thread-safety Intel® Parallel Inspector
Motivation • Developing threaded applications can be a complex task • New class of problems are caused by the interaction between concurrent threads • Data races or storage conflicts • More than one thread accesses memory without synchronization • Deadlocks • Thread waits for an event that will never happen Intel® Parallel Inspector
Intel® Parallel Inspector • Debugging tool for threaded software • Plug-in to Microsoft* Visual Studio* • Finds threading bugs in OpenMP*, Intel® Threading Building Blocks, and Win32* threaded software • Locates bugs quickly that can take days to find using traditional methods and tools • Isolates problems, not the symptoms • Bug does not have to occur to find it! Intel® Parallel Inspector
Intel® Parallel Inspector Features • Integrated into Microsoft Visual Studio .NET* IDE • 2005 & 2008 Editions • Supports different compilers • Microsoft* Visual* C++ .NET* • Intel Parallel Composer • View (drill-down to) source code for Diagnostics • One-click help for diagnostics • Possible causes and solution suggestions Intel® Parallel Inspector
Parallel Inspector: Analysis • Dynamic as software runs • Data (workload) -driven execution • Includes monitoring of: • Thread and Sync APIs used • Thread execution order • Scheduler impacts results • Memory accesses between threads Code path must be executed to be analyzed Intel® Parallel Inspector
Parallel Inspector: Before You Start • Instrumentation: background • Adds calls to library to record information • Thread and Sync APIs • Memory accesses • Increases execution time and size • Use small data sets (workloads) • Execution time and space is expanded • Multiple runs over different paths yield best results Workload selection is important! Intel® Parallel Inspector
Workload Guidelines • Execute problem code once per thread to be identified • Use smallest possible working data set • Minimize data set size • Smaller image sizes • Minimize loop iterations or time steps • Simulate minutes rather than days • Minimize update rates • Lower frames per second Finds threading errors faster! Intel® Parallel Inspector
Building for Parallel Inspector • Compile • Use dynamically linked thread-safe runtime libraries (/MDd) • Generate symbolic information (/ZI) • Disable optimization (/Od) • Link • Preserve symbolic information (/DEBUG) • Specify relocatable code sections (/FIXED:NO) Intel® Parallel Inspector
Binary Instrumentation • Build with supported compiler • Running the application • Must be run from within Parallel Inspector • Application is instrumented when executed • External DLLs are instrumented as used Intel® Parallel Inspector
Starting Parallel Inspector • Build the Debug version of the application with appropriate flags set Intel® Parallel Inspector
Starting Parallel Inspector • Select Parallel Inspector from the Tools menu You can choose to look for • Memory Errors • Threading Errors Intel® Parallel Inspector
Starting Parallel Inspector • The Configure Analysis window pops up Select the level of analysis to be carried out by Parallel Inspector • The deeper the analysis, the more thorough the results and the longer the execution time Click Run Analysis Intel® Parallel Inspector
Starting Parallel Inspector • The initial (raw) results come up after analysis Click the Interpret Results button to filter the raw data into more human consumable formats Intel® Parallel Inspector
Starting Parallel Inspector • The analysis results are gathered together in related categories Double-click a line from the Problem Sets pane to see the source code that generated the diagnostic Intel® Parallel Inspector
Starting Parallel Inspector • The source lines involved in a data race can be shown Intel® Parallel Inspector
Activity 1a - Potential Energy • Build and run serial version • Build threaded version • Run application in Parallel Inspector to identify threading problems Intel® Parallel Inspector
Race Conditions • Execution order is assumed but cannot be guaranteed • Concurrent access of same variable by multiple threads • Most common error in multithreaded programs • May not be apparent at all times Intel® Parallel Inspector
Solving Race Conditions • Solution: Scope variables to be local to threads • When to use • Value computed is not used outside parallel region • Temporary or “work” variables • How to implement • OpenMP scoping clauses (private, shared) • Declare variables within threaded functions • Allocate variables on thread stack • TLS (Thread Local Storage) API Intel® Parallel Inspector
Solving Race Conditions • Solution: Control shared access with critical regions • When to use • Value computed is used outside parallel region • Shared value is required by each thread • How to implement • Mutual exclusion and synchronization • Lock, semaphore, event, critical section, atomic… • Rule of thumb: Use one lock per data element Intel® Parallel Inspector
Activity 1b - Potential Energy • Fix errors found by Parallel Inspector Intel® Parallel Inspector
Deadlock • Caused by thread waiting on some event that will never happen • Most common cause is locking hierarchies • Always lock and un-lock in the same order • Avoid hierarchies if possible DWORD WINAPI threadA(LPVOID arg) { EnterCriticalSection(&L1); EnterCriticalSection(&L2); processA(data1, data2); LeaveCriticalSection(&L2); LeaveCriticalSection(&L1); return(0); } ThreadB: L2, then L1 DWORD WINAPI threadB(LPVOID arg) { EnterCriticalSection(&L2); EnterCriticalSection(&L1); processB(data2, data1) ; LeaveCriticalSection(&L1); LeaveCriticalSection(&L2); return(0); } ThreadA: L1, then L2 Intel® Parallel Inspector
Thread 4 swap(Q[986], Q[34]); Thread 1 Grabs mutex 34 Grabs mutex 986 swap(Q[34], Q[986]); Deadlock • Add lock per element • Lock only elements, not whole array of elements typedef struct { // some data things SomeLockType mutex; } shape_t; shape_t Q[1024]; void swap (shape_t A, shape_t B) { lock(a.mutex); lock(b.mutex); // Swap data between A & B unlock(b.mutex); unlock(a.mutex); } Intel® Parallel Inspector
Windows* Critical Section • Lightweight, intra-process only mutex • Most useful and most used • New type • CRITICAL_SECTION cs; • Create and destroy operations • InitializeCriticalSection(&cs) • DeleteCriticalSection(&cs); Intel® Parallel Inspector
Windows* Critical Section • CRITICAL_SECTIONcs ; • Attempt to enter protected code • EnterCriticalSection(&cs) • Blocks if another thread is in critical section • Returns when no thread is in critical section • Upon exit of critical section • LeaveCriticalSection(&cs) • Must be from obtaining thread Intel® Parallel Inspector
#define NUMTHREADS 4 CRITICAL_SECTION g_cs; // why does this have to be global? int g_sum = 0; DWORD WINAPI threadFunc(LPVOID arg ) { int mySum = bigComputation(); EnterCriticalSection(&g_cs); g_sum += mySum; // threads access one at a time LeaveCriticalSection(&g_cs); return 0; } main() { HANDLE hThread[NUMTHREADS]; InitializeCriticalSection(&g_cs); for (int i = 0; i < NUMTHREADS; i++) hThread[i] = CreateThread(NULL,0,threadFunc,NULL,0,NULL); WaitForMultipleObjects(NUMTHREADS, hThread, TRUE, INFINITE); DeleteCriticalSection(&g_cs); } Example: Critical Section Intel® Parallel Inspector
Activity 2 - Deadlock • Use Intel® Parallel Inspector to find and correct the potential deadlock problem. Intel® Parallel Inspector
Thread Safe Routines • All routines called concurrently from multiple threads must be thread safe • How to test for thread safety? • Use OpenMP and Parallel Inspector for analysis • Use sections to create concurrent execution Intel® Parallel Inspector
Check for safety issues between Multiple instances of routine1() Instances of routine1() and routine2() Set up sections to test all permutations Still need to provide data sets that exercise relevant portions of code Thread Safety Example • #pragma omp parallel sections { #pragma omp section routine1(&data1); #pragma omp section routine1(&data2); #pragma omp section routine2(&data3); } Intel® Parallel Inspector
It is better to make a routine reentrant than to add synchronization • Avoids potential overhead Two Ways to Ensure Thread Safety • Routines can be written to be reentrant • Any variables changed by the routine must be local to each invocation • Don’t modify globally shared variables • Routines can use mutual exclusion to avoid conflicts with other threads • If accessing shared variables cannot be avoided • What if third-party libraries are not thread safe? • Will likely need to control threads access to library Intel® Parallel Inspector
Activity 3 – Thread Safety • Use OpenMP framework to call library routines concurrently • Three library calls = 6 combinations to test • A:A, B:B, C:C, A:B, A:C, B:C Intel® Parallel Inspector
Intel® Parallel InspectorWhat’s Been Covered • Threading errors are easy to introduce • Debugging these errors by traditional techniques is hard • Intel® Parallel Inspector catches these errors • Errors do not have to occur to be detected • Greatly reduces debugging time • Improves robustness of the application Intel® Parallel Inspector