1 / 69

Concurrency and Race Conditions

Concurrency and Race Conditions. Linux Kernel Programming CIS 4930/COP 5641. Motivation: Example Pitfall in Scull. Pitfalls in scull. Race condition : result of uncontrolled access to shared data if (!dptr->data[s_pos]) { dptr->data[s_pos] = kzalloc(quantum, GFP_KERNEL);

olinf
Download Presentation

Concurrency and Race Conditions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrency and Race Conditions Linux Kernel Programming CIS 4930/COP 5641

  2. Motivation:Example Pitfall in Scull

  3. Pitfalls in scull • Race condition: result of uncontrolled access to shared data if (!dptr->data[s_pos]) { dptr->data[s_pos] = kzalloc(quantum, GFP_KERNEL); if (!dptr->data[s_pos]) { goto out; } }

  4. Pitfalls in scull • Race condition: result of uncontrolled access to shared data if (!dptr->data[s_pos]) { dptr->data[s_pos] = kzalloc(quantum, GFP_KERNEL); if (!dptr->data[s_pos]) { goto out; } }

  5. Pitfalls in scull • Race condition: result of uncontrolled access to shared data if (!dptr->data[s_pos]) { dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL); if (!dptr->data[s_pos]) { goto out; } }

  6. Pitfalls in scull • Race condition: result of uncontrolled access to shared data if (!dptr->data[s_pos]) { dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL); if (!dptr->data[s_pos]) { goto out; } } Memory leak

  7. Managing Concurrency

  8. Concurrency and Its Management • Sources of concurrency • Multiple user-space processes • Multiple CPUs • Device interrupts • Timers

  9. Some guiding principles • Try to avoid concurrent access entirely • Global variables • Apply locking and mutual exclusionprinciples • Implications to device drivers • Use sufficient concurrency mechanisms (depending on context) • No object can be made available to the kernel until it can function properly • References to such objects must be tracked for proper removal • Avoid “roll your own” solutions

  10. Managing Concurrency • Atomic operation: all or nothing from the perspective of other threads • Critical section: code executed by only one thread at a time • Not all critical sections are the same • Access from interrupt handlers • Latency constraints

  11. Lock Design Considerations • Context • Can another thread be scheduled on the current processor? • Assumptions of kernel operation • Breaking assumptions will break code that relies on them • Time expected to wait for lock • Considerations • Amount of time lock is expected to be held • Amount of expected contention • Long • Other threads can make better use of the processor • Short • Time to switch to another thread will be longer than just waiting a short amount of time

  12. Kernel Locking Implementations • mutex • Sleep if lock cannot be acquired immediately • Allow other threads to use the processor • spinlock • Continuously try to grab the lock • Generally do not allow sleeping • Why?

  13. Mutex

  14. Mutex Implementation • Architecture-dependent code • Optimizations • Initialization • DEFINE_MUTEX(name) • void mutex_init(struct mutex *lock); • Various routines • void mutex_lock(struct mutex *lock); • int mutex_lock_interruptible(struct mutex *lock); • int mutex_lock_killable(struct mutex *lock); • void mutex_unlock(struct mutex *lock);

  15. Using mutexes in scull • scull_devstructure revisited struct scull_dev { struct scull_qset *data; /* Pointer to first quantum set */ int quantum; /* the current quantum size */ int qset; /* the current array size */ unsigned long size; /* amount of data stored here */ unsigned int access_key; /* used by sculluid & scullpriv */ struct mutex mutex; /* mutual exclusion */ struct cdev cdev; /* Char device structure */ };

  16. Using mutexes in scull • scull_devinitialization for (i = 0; i < scull_nr_devs; i++) { scull_devices[i].quantum = scull_quantum; scull_devices[i].qset = scull_qset; mutex_init(&scull_devices[i].mutex); /* before cdev_add */ scull_setup_cdev(&scull_devices[i], i); }

  17. Using mutexes in scull • scull_write() if (mutex_lock_interruptible(&dev->mutex)) return -ERESTARTSYS; • scull_write ends with out: mutex_unlock(&dev->mutex); return retval;

  18. mutex_lock_interruptible() returns nonzero • If can be resubmitted • Undo visible changes if any and restart • Otherwise return ‑EINTR • E.g., could not undo changes

  19. mutex_lock_interruptible()(returns non-zero) • If can be resubmitted • Undo visible changes if any and restart • Otherwise return ‑EINTR • E.g., could not undo changes

  20. Restartable system call • Automatic restarting of certain interrupted system calls • Retry with same arguments (values) • Simplifies user-space programming for dealing with "interrupted system call“ • POSIX permits an implementation to restart system calls, but it is not required. • SUS defines the SA_RESTART flag to provide a means by which an application can request that an interrupted system calls be restarted. • http://pubs.opengroup.org/onlinepubs/009604499/functions/sigaction.html • return ‑ERESTARTSYS

  21. Restartable system call • Arguments may need to be modified • return ‑ERESTARTSYS_RESTARTBLOCK • Specify callback function to modify arguments • http://lwn.net/Articles/17744/

  22. Userspace write()and kernelspace*_interruptible() • From POSIX man page • If write() is interrupted by a signal before it writes any data, it shall return -1 with errno set to [EINTR]. • If write() is interrupted by a signal after it successfully writes some data, it shall return the number of bytes written. • http://pubs.opengroup.org/onlinepubs/009604499/functions/sigaction.html

  23. mutex_lock_killable() • mutex_lock() • Process assumes that it cannot be interrupted by a signal • Breaking assumption breaks user-kernel space interface • If process receives fatal signal and mutex_lock() never returns • Results in an immortal process • Assumptions/expectations do not apply if process receives fatal signal • Process that called system call will never return • Does not break assumption since process does not continue • http://lwn.net/Articles/288056/

  24. Mutex Usage as Completion (Error)https://lkml.org/lkml/2013/12/2/997

  25. General Pattern • refcount variable for deciding which thread to perform cleanup • Usage • Initialize shared object • Set refcount to number of concurrent threads • Start multiple threads • Last thread cleans up <do stuff> mutex_lock(obj->lock); dead = !--obj->refcount; mutex_unlock(obj->lock); if (dead) free(obj);

  26. fs/pipe.c __pipe_lock(pipe); … spin_lock(&inode->i_lock); if (!--pipe->files) { inode->i_pipe = NULL; kill = 1; } spin_unlock(&inode->i_lock); __pipe_unlock(pipe); if (kill) free_pipe_info(pipe);

  27. CPU 1 CPU 2 mutex_lock(obj->lock); dead = !--obj->refcount; // refcount was 2, is now 1, dead = 0. mutex_lock(obj->lock); // blocks on obj->lock, goes to slowpath // mutex is negative, CPU2 is in optimistic // spinning mode in __mutex_lock_common mutex_unlock(obj->lock);__mutex_fastpath_unlock()fastpath fails (because mutex is nonpositive__mutex_unlock_slowpath:if (__mutex_slowpath_needs_to_unlock())atomic_set(&lock->count, 1); if ((atomic_read(&lock->count) == 1) && (atomic_cmpxchg(&lock->count, 1, 0) == 1)) { .. and now CPU2 owns the mutex, and goes on dead = !--obj->refcount; // refcount was 1, is now 0, dead = 1. mutex_unlock(obj->lock); if (dead) free(obj); but in the meantime, CPU1 is busy still unlocking: if (!list_empty(&lock->wait_list)) {

  28. Conclusion • Mutex serializes what is insidethe mutex, but not necessarily the lock ITSELF • Use spinlocks and/or atomic ref counts • "don't use mutexes to implement completions"

  29. Completions

  30. Completions • Start and wait for operation to complete (outside current thread) • Common pattern in kernel programming • E.g., wait for initialization to complete • Reasons to use instead of mutexes • Wake up multiple threads • More efficient • More meaningful syntax • Subtle races with mutex implementation code • Cleanup of mutex itself • http://lkml.iu.edu//hypermail/linux/kernel/0107.3/0674.html • https://lkml.org/lkml/2008/4/11/323 • completions • #include <linux/completion.h>

  31. Completions • To create a completion DECLARE_COMPLETION(my_completion); • Or struct completion my_completion; init_completion(&my_completion); • To wait for the completion, call void wait_for_completion(struct completion *c); void wait_for_completion_interruptible(struct completion *c); void wait_for_completion_timeout(struct completion *c, unsigned long timeout);

  32. Completions • To signal a completion event, call one of the following /* wake up one waiting thread */ void complete(struct completion *c); /* wake up multiple waiting threads */ /* need to call INIT_COMPLETION(struct completion c) to reuse the completion structure */ void complete_all(struct completion *c);

  33. Completions • Example: misc-modules/complete.c DECLARE_COMPLETION(comp); ssize_t complete_read(struct file *filp, char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) going to sleep\n", current->pid, current->comm); wait_for_completion(&comp); printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm); return 0; /* EOF */ }

  34. Completions • Example ssize_t complete_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) awakening the readers...\n", current->pid, current->comm); complete(&comp); return count; /* succeed, to avoid retrial */ }

  35. Spinlocks

  36. Spinlocks • Generally used in code that should not sleep • (e.g., interrupt handlers) • Usually implemented as a single bit • If the lock is available, the bit is set and the code continues • If the lock is taken, the code enters a tight loop • Repeatedly checks the lock until it become available

  37. Spinlocks • Actual implementation varies for different architectures • Protect a process from other CPUs and interrupts • Usually does nothing on uniprocessor machines • Exception: changing the IRQ masking status

  38. Introduction to Spinlock API • #include <linux/spinlock.h> • To initialize, declare spinlock_t my_lock = SPIN_LOCK_UNLOCKED; • Or call void spin_lock_init(spinlock_t *lock); • To acquire a lock, call void spin_lock(spinlock_t *lock); • Spinlock waits are uninterruptible • To release a lock, call void spin_unlock(spinlock_t *lock);

  39. Spinlocks and Atomic Context • While holding a spinlock, be atomic • Do not sleep or relinquish the processor • Examples of calls that can sleep • Copying data to or from user space • User-space page may need to be on disk… • Memory allocation • Memory might not be available • Disable interrupts (on the local CPU) as needed • Hold spinlocks for the minimum time possible

  40. The Spinlock Functions • Four functions to acquire a spinlock void spin_lock(spinlock_t *lock); /* disables interrupts on the local CPU */ void spin_lock_irqsave(spinlock_t *lock, unsigned long flags); /* only if no other code disabled interrupts */ void spin_lock_irq(spinlock_t *lock); /* disables software interrupts; leaves hardware interrupts enabled (e.g. tasklets)*/ void spin_lock_bh(spinlock_t *lock);

  41. The Spinlock Functions • Four functions to release a spinlock void spin_unlock(spinlock_t *lock); /* need to use the same flags variable for locking */ /* need to call spin_lock_irqsave and spin_unlock_irqrestore in the same function, or your code may break on some architectures */ void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags); void spin_unlock_irq(spinlock_t *lock); void spin_unlock_bh(spinlock_t *lock);

  42. Locking Traps • It is very hard to manage concurrency • What can possibly go wrong?

  43. Ambiguous Rules • Shared data structure D, protected by lock L function A() { lock(&L); /* call function B() that accesses D */ unlock(&L); } • If function B() calls lock(&L), we have a deadlock

  44. Ambiguous Rules • Solution • Have clear entry points to access data structures • Document assumptions about locking

  45. function A() { lock(&L1); lock(&L2); /* access D */ unlock(&L2); unlock(&L1) } function B() { lock(&L2); lock(&L1); /* access D */ unlock(&L1); unlock(&L2) } Lock Ordering Rules - Multiple locks should always be acquired in the same order - Easier said than done

  46. function A() { lock(&L1); X(); unlock(&L1) } function X() { lock(&L2); /* access D */ unlock(&L2); } function B() { lock(&L2); Y(); unlock(&L2) } function Y() { lock(&L1); /* access D */ unlock(&L1); } Lock Ordering Rules

  47. Lock Ordering Rules of Thumb • Choose a lock ordering that is local to your code before taking a lock belonging to a more central part of the kernel • Lock of central kernel code likely has more users (more contention) • Obtain the mutex first before taking the spinlock • Grabbing a mutex (which can sleep) inside a spinlock can lead to deadlocks

  48. Fine- Versus Coarse-Grained Locking • Coarse-grained locking • Poor concurrency • Fine-grained locking • Need to know which one to acquire • And which order to acquire • At the device driver level • Start with coarse-grained locking • Refine the granularity as contention arises • Can enable lockstat to check lock holding time

  49. BKL • Kernel used to have “big kernel lock” • Giant spinlock introduced in Linux 2.0 • Only one CPU could be executing locked kernel code at any time • BKL has been removed • https://lwn.net/Articles/384855/ • https://www.linux.com/learn/tutorials/447301:whats-new-in-linux-2639-ding-dong-the-big-kernel-lock-is-dead

  50. Alternatives to Locking • Lock-free algorithms • Atomic variables • Bit operations • seqlocks • Read-copy-update (RCU)

More Related