240 likes | 553 Views
LIRA: Linux Inter-process Race Analyzer. Tipp Moseley Intel Corporation University of Colorado. Introduction. Motivation for Software Quality Related work Why Lira? Race & Deadlock Detection Algorithms Design Pitfalls Results. Motivation for Software Quality.
E N D
LIRA: Linux Inter-processRace Analyzer Tipp Moseley Intel Corporation University of Colorado
Introduction • Motivation for Software Quality • Related work • Why Lira? • Race & Deadlock Detection Algorithms • Design • Pitfalls • Results
Motivation for Software Quality • Various tools, vendors, targets • Many gaps in availability (X-Scale, IPF) • Suite of tools adds value to underlying hardware • Quality + manageability + security => buy more silicon • Platform enabling • IBM’s Power, Motorola both have quality tools • These tools are fundamental for supporting platform
Related Work • Memory tools • Purify • Valgrind • Etnus • Bitraker • 3rd • Race/deadlock tools • Eraser • Ladybug • Intel Thread Checker • Multirace • Lira
What is Lira? • What is Lira? • Linux Inter-process Race Analyzer • Dynamically detect race conditions for • Memory read/write • Generic resource (file, socket, etc) • Dynamically detect potential deadlock for • Any combination of inter-process synchronization primatives • User must insert callbacks in locking code
Why Lira? • Many enterprise systems depend on shared memory and concurrency • No tool exists for debugging across processes • Java w/ native threads • XF86, Gnome, KDE • Baan • Samba • Sybase • Oracle • MySQL • PostgreSQL • Apache • SAP
Race Detection • Eraser memory states • Shamelessly borrowed from Savage, et al. SOSP 1997
Race Detection • Eraser Lockset Algorithm Let locks held(t) be the set of locks held in any mode by thread t. Let write locks held(t) be the set of locks held in write mode by thread t. For each v, initialize C(v) to the set of all locks. On each read of v by thread t, set C(v) := C(v) ∩locks held(t); if C(v) := { }, then issue a warning. On each write of v by thread t, set C(v) := C(v) ∩write locks held(t); if C(v) == { }, then issue a warning. • Also shamelessly borrowed from Savage, et al. SOSP 1997
Deadlock Detection • Only checks full ordering of lock hierarchy • Does not recognize that a->b->c and a->c->b are both OK (though bad practice) • Data structures: • For each lock, maintain before and after set • Each time a lock l is acquired: before(l) = before(l) locks-held If before(l)after(l) != {} then ERROR for l2 in parents(l) do after(l2) = after(l2) l If contains(before(l2), l) then ERROR
Design - Issues • Must follow exec() and communicate via shared memory as well • Different address spaces • Shared memory binds to different addresses • Files to different file descriptors • No common synchronization api or model • OS Semaphores • flock(), fcntl() • lock; xchgb
Design – Front End • Initialize Pin • Instrument memory refs, system calls, and user locking callbacks • e.g. LIRA_LockEx_HW(&my_lock) • Patch execve() system call with pin –t lira – <original cmd> • children get Pin’d, also • Unique feature to Lira • Other tools require user to modify scripts by hand • $ pin –t lira – make test
Design – Front End • Filter irrelevent information • Maintain information about shared memory, file descriptors • Client needs <shared region name> + offset because effective address differs across address spaces • Client needs entire file path instead of fd • Send shared memory refs, lock ops, other callbacks to log buffer
Design – Back End • LiraClient: • Parse ASCII data from ShmLogReader • Maintain state tables for each shared memory address, file descriptor • Drive RaceAnalyzer • Report race conditions • RaceAnalyzer: • Generic implementation of Eraser algorithm • Check lock ordering • LockModel: • Generic representation of various locking primitives • LockModelSemaphore, LockModelHardware, etc
Design – IPC • How do we communicate data from multiple processes to the client process? • ShmLogWriter -> ShmLogReader • Maintain a synchronized log file, protected by shared memory lock • If file becomes to big, begin new file • Online client deletes files when done processing (data may take up gigabytes of space in minutes) • Offline client processes files after execution completes
Pitfalls • Maintaining state of sem/shm/fds from syscalls was painful • Solution: cache information from /proc • Offline processing can lead to enormous logs • Solution: Online processing and delete processed info
Pitfalls • Inferring meaning of lock operations was faulty at best • Solution: Offer user callbacks to capture intended meaning of synchronization operations. • Unacceptably slow • Solution: -O6, cache frequently used data, do some work at instrumentation time, inline frequent calls
Sample Program // INITIALIZATION int *shmem = getShmem(sizeof(int)); sharedlock_t lock0 = getSharedLock(); sharedlock_t lock1 = getSharedLock(); lock_init(&lock0); lock_init(&lock1); fork(); // make 2 processes
Sample Program int i = 0; while( i < 100000 ) { lock(&lock0); lock(&lock1); *shmem++; // ERROR: UNINITIALIZED READ! unlock(&lock1); unlock(&lock0); } // ERROR: wrong lock hierarchy – potential deadlock! lock(&lock1); lock(&lock0); unlock(&lock0); unlock(&lock1); *shmem++; // ERROR: no locks held! exit(0);
Results • Uninitialized LOAD • WARNING: possible uninitialized LOAD for segment=/SYSV000004d3@0 offset=0 opsize=4 at pc=0x8049208 tid=0 pid=32576 srcfile=tests/locktest0.C srcline=42 • No locks held for stdout • ERROR: no locks held for FWR to file=/dev/pts/3 at pc=0x420d18bc tid=0 pid=32584 srcfile=tests/locktest0.C srcline=40
Results • Inconsistent locks held • ERROR: inconsistent locks held for LOAD to segment=/SYSV000004d3@0 offset=0 opsize=4 at pc=0x804953b tid=0 pid=32576 srcfile=tests/locktest0.C srcline=86 • Bad lock hierarchy • ERROR: inconsistent lock order at pc=0x8048ebc tid=0 pid=32576 srcfile=tests/../LiraCallbacks.h srcline=64
Future Work • Lira: • Further optimization • Still at ~500x slowdown (improved from >100000x) • Work with Pin team to only instrument shared memory segments • Code that does not touch shared memory not instrumented • Find some bugs! • LIRA can find potential errors, user must verify • Lots of work to figure out if a LIRA report really is an error in a large program (i.e. PostgreSQL, Oracle) • Potential analysis integration with Intel Thread Checker
References • http://www-2.cs.cmu.edu/afs/cs/academic/class/15740-f03/public/doc/atom-user.pdf • http://www.eecs.harvard.edu/~jonathan/papers/1997/eraser-sosp97.ps.gz