190 likes | 352 Views
Overview of a Google Tool Thread Sanitizer v2. Introduction. Race Detector based on Shadow Memory Faster than Valgrind Intel Parallel Inspector (PIN) Fully parallel No expensive synchronization (atomics/locks) on fast path Scales to huge apps Predictable memory footprint
E N D
Introduction • Race Detector based on Shadow Memory • Faster than • Valgrind • Intel Parallel Inspector (PIN) • Fully parallel • No expensive synchronization (atomics/locks) on fast path • Scales to huge apps • Predictable memory footprint • Informative reports
Data Race Example script.sh: -------------------------------------- #!/bin/bash for i in {0..10..1} do ./tsan_example sleep 1 done #include <stdio.h> #include <pthread.h> int Global[4]; void *Thread1(void *x) { Global[0] = -1; return NULL; } void *Thread2(void *x) { printf("Global[2] = %d\n", Global[2]); printf("Global[3] = %d\n", Global[3]); return NULL; } void *Thread3(void *x) { printf("Global[0] = %d\n", Global[0]); printf("Global[1] = %d\n", Global[1]); return NULL; } intmain() { for(int i = 0; i < 4; i++) Global[i] = i; pthread_t t[3]; pthread_create(&t[0], NULL, Thread1, NULL); pthread_create(&t[1], NULL, Thread2, NULL); pthread_create(&t[2], NULL, Thread3, NULL); pthread_join(t[0], NULL); pthread_join(t[1], NULL); pthread_join(t[2], NULL); return 0; } C/C++ Program: Thereisa data race on the global vector, indeeddepending on the threadscheduling, T1 can write Global[0] before T3 readit or viceversa, printingdifferentvalues.
Data Race Example Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[0] = -1 Global[1] = 1 Global[2] = 2 Global[3] = 3 ---------- Global[0] = 0 Global[1] = 1 Global[2] = 2 Global[3] = 3 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[0] = 0 Global[1] = 1 Global[2] = 2 Global[3] = 3 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ---------- Global[2] = 2 Global[3] = 3 Global[0] = -1 Global[1] = 1 ----------
Tsan: Data Race Example Global[2] = 2 Global[3] = 3 ================== WARNING: ThreadSanitizer: data race (pid=25893) Read of size 4 at 0x7f5ab4c16cc0 by thread T3: #0 Thread3(void*) /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:18 (exe+0x0000000657b9) Previouswrite of size 4 at 0x7f5ab4c16cc0 by thread T1: #0 Thread1(void*) /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:7 (exe+0x000000065739) Thread T3 (tid=25901, running) created by mainthreadat: #0 pthread_create /home/simone/works/projects/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:820 (exe+0x0000000248e3) #1 main /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:29 (exe+0x00000006586e) Thread T1 (tid=25899, finished) created by mainthreadat: #0 pthread_create /home/simone/works/projects/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:820 (exe+0x0000000248e3) #1 main /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:27 (exe+0x00000006583e) SUMMARY: ThreadSanitizer: data race /home/simone/works/projects/thread-sanitizer/examples/tsan_example.cc:18 Thread3(void*) ================== Global[0] = -1 Global[1] = 1 ---------- ThreadSanitizer: reported 1 warnings
How ThreadSanitizerworksCompiling Time • Instrumenteverymemoryaccess in the programprepending a function call: • __tsan_read4(addr) • Atomicmemoryaccessusing: • __tsan_atomic_callbacks • Read from vtable: • __tsan_vptr_update • Function entry and exit: • __tsan_func_entry(caller_pc) • __tsan_func_exit. • Inizialization: • __tsan_init
ThreadSanitizer: Algorithm • Direct ShadowMapping (64-bit linux) • Shadow = 4 * (Addr & kMask); Application 0x7fffffffffff 0x7f0000000000 Protected 0x7effffffffff 0x200000000000 Shadow 0x1fffffffffff 0x180000000000 Protected 0x17ffffffffff 0x00000000000
How ThreadSanitizerworksRun-Time Library • Shadow Cell • 64 bits word, represents a single memoryaccess (happened) to a subset of byteswithin the 8-byte word of applicationmemory • ShadowStates • NShadowWords(2, 4, or 8: represents the numberof accesses to the correspondingapplicationmemoryregion by the threads)
ThreadSanitizer: Algorithm • State Machine • Core of the algorithmthatupdates the Shadow State on everymemoryaccess • Steps: • Thread’s clock isincremented and a new Shadow Word (corresponding to the currentmemoryaccess) iscreated • State Machine iterates over allShadowWordsin the Shadow State: ifone of the ShadowWordsconsitutes a race with the new Shadow Word a warningwill be reported • The new Shadow Word isinserted in place of an emptyShadow Word or in place of a Shadow Word happened-before the new one (if no space a random Shadow Word isevicted)
ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) TID TID TID TID Epoch Epoch Epoch Epoch Pos Pos Pos Pos IsW IsW IsW IsW • Program with 3 threads
ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) T1 First Access E1 0:2 Write in thread T1 W T1 write 2 bytes on a memory location • Program with 3 threads
ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) T2 T1 Second Access E2 E1 4:8 0:2 Read in thread T2 R W T2 read 4 bytes from anothermemory location • Program with 3 threads
ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) T2 T1 T3 Third Access E3 E1 E2 0:4 0:2 4:8 Read in thread T3 W R R T3 read 4 bytes from a memory location, part of that (2 bytes) waspreviouslywritten by T1 • Program with 3 threads
ThreadSanitizerAlgorithm: Example • 4 ShadowCells per 8 applicationbytes (ShadowStatestraces 4 memoryaccesses) T2 T1 T3 E2 E1 E3 4:8 0:2 0:4 R W R Thereis a RACEbecausethereisnot an “happenbefore” relation, betweenE1 and E3, E1 || E3 • Program with 3 threads
ThreadSanitizer: Algorithm defHandleMemoryAccess(addr, tid, is_write, size, pc): shadow_address= MapApplicationToShadow(addr) IncrementThreadClock(tid) LogEvent(tid, pc); new_shadow_word= {tid, CurrentClock(tid), is_write, size, addr & 7} store_word= new_shadow_word for i in 1..N: UpdateOneShadowState(shadow_address, i, new_shadow_word, store_word) ifstore_word: # Evict a random Shadow Word shadow_address[Random(N)] = store_word # Atomic
ThreadSanitizer: Algorithm defUpdateOneShadowState(shadow_address, i, new_shadow_word, store_word): idx= (i + new_shadow_word.offset) % N old_shadow_word= shadow_address[idx] # Atomic ifold_shadow_word == 0: # The old state isempty ifstore_word: StoreIfNotYetStored(shadow_address[idx], store_word) return ifAccessedSameRegion(old_shadow_word, new_shadow_word): ifSameThreads(old_shadow_word, new_shadow_word): StoreIfNotYetStored(shadow_address[idx], store_word) return else: # Differentthreads ifnotHappensBefore(old_shadow_word, new_shadow_word): ReportRace(old_shadow_word, new_shadow_word) elifAccessedIntersectingRegions(old_shadow_word, new_shadow_word): ifnotSameThreads(old_shadow_word, new_shadow_word) ifnotHappensBefore(old_shadow_word, new_shadow_word) ReportRace(old_shadow_word, new_shadow_word) else: # regionsdidnotintersect pass # do nothing
ThreadSanitizer: Algorithm • Constant-time operation • Get TID and Epoch from the Shadow Cell • 1 load from thread-localstorage • 1 comparison • Similar idea to FastTrack
ThreadSanitizer: Algorithm • Stack Trace for previousaccess • Per-threadcyclic buffer of event • 64 bit per event (type + PC) • Events: memoryaccess (read/write), function entry/exit • Information will be lostaftersometime (cyclic buffer) • Buffer sizeisconfigurable • Functioninterceptors • malloc, free, … • pthread_mutex, lock, … • strlen, memcmp, … • read, write, …
Pros and Cons • Pros • Speed, >10x fasterthanothertools • Native support for atomicsoperations • Numbers: 200+ races in google server-side apps, 80+ in Go programs and lib, severalraces in SSL • Cons • Only 64-bit Linux • Hard to port to 32-bit platforms (small address-spaces)