390 likes | 572 Views
Read-Copy Update. Paul E. McKenney Linux Technology Center IBM Beaverton pmckenne@us.ibm.com, http://www.rdrop.com/users/paulmck Jonathan Appavoo Department of Electrical and Computer Engineering University of Toronto jonathan@eecg.toronto.edu Andi Kleen SuSE Labs ak@suse.de
E N D
Read-Copy Update Paul E. McKenney Linux Technology Center IBM Beaverton pmckenne@us.ibm.com, http://www.rdrop.com/users/paulmck Jonathan Appavoo Department of Electrical and Computer Engineering University of Toronto jonathan@eecg.toronto.edu Andi Kleen SuSE Labs ak@suse.de Orran Krieger IBM T. J. Watson Research Center okrieg@us.ibm.com, http://www.eecg.toronto.edu/~okrieg Rusty Russell RustCorp rusty@rustcorp.com.au Dipankar Sarma Linux Technology Center IBM India Software Lab dipankar.sarma@in.ibm.com Maneesh Soni Linux Technology Center IBM India Software Lab smaneesh@in.ibm.com Liao,Hsiao-Win
Outline • Introduce • Toy Example • Simple Infrastructure to Support RCU • Application
Outline • Introduce • Toy Example • Simple Infrastructure to Support RCU • Application
Traditional OS locking designs • very complex • poor concurrency • Fail to take advantage of event-driven nature of operating systems
Race Between Teardown and Use of Service code executed, Interrupts taken memory error-correction events
Read-Copy Update Handling Race When quiescent state
Read-copy update works best when • divide an update into two phases • proceed on stale data for common-case operations (e.g. continuing to handle operations by a module being unloaded) • destructive updates are very infrequent.
Implementations ofQuiescent State • DYNIX/ptx 2.1 (1993) and Rusty Russell's firstwait_for_rcu() patch [Russell01a] simply execute onto each CPU in turn. • DYNIX/ptx 4.0 (1994) and Dipankar Sarma's RCU patch for Linux use context switch, execution in the idle loop, execution in user mode, system call entry, trap from user mode, and CPU offline (this last for DYNIX/ptx only) as the quiescent states.
Implementations ofQuiescent State • Rusty Russell's secondwait_for_rcu()patch [Russell01b] uses voluntary context switch as the sole quiescent state • Tornado's and K42's "generation" facility tracks beginnings and ends of operations
Outline • Introduce • Toy Example • Simple Infrastructure to Support RCU • Application
Reference-count v.s Read-copy • search() and delete() • read-copy functions avoid all cacheline bouncing for reading tasks • read-copy functions can returnreferences to deleted elements • read-copy functions cannot hold a reference to elements across a voluntary context switch
Typical RCU update sequence • Remove pointers to a data structure. • Wait for all previous reader to complete their RCU read-side critical sections. • at this point, there cannot be any readers who hold reference to the data structure, so it now may safely be reclaimed.
Read-Copy Deletion first 18
Read-Copy Search The Task See Table data
Read-Copy Deletion Second 18
Read-Copy Deletion When
Assumptions • Read intensive • the update fraction f < 1/ |CPU| • Grace period • reading tasks can see stale data • requires that the modification be compatible with lock-free access • linked-list insertion, deletion, and replacement are compatible
Outline • Introduce • Toy Example • Simple Infrastructure to Support RCU • Application
Simple Implementation • Wait_for_rcu() • waits for a grace period to expire • Kfree_rcu() • waits for a grace period before freeing a specified block of memory.
Read-Copy Update Grace Period non-preemptible kernel execution Quiescentstate execution
Shortcomings • Not work in a preemptible kernel unless preemption is suppressed in all read-side critical sections • Not be called from an interrupt handler • Not be called while holding a spinlock or with interrupts disabled • Relatively slow
Addressing • The K42 and Tornado implementations of RCU are such that read-side critical sections can block as well as being preempted—solve 1 • Call_rcu() --solve 2、3 • Kfree_rcu() --solve 2、3 • High-Performance Design for RCU –solve 2、3、4
K42 and Tornado implementations of RCU • maintain two generation counters • current generation • non-current generation • Operations (next page)
Operation • A Operation begins • increment the current counter • store a pointer to that counter in the task • the operation ends • Decrement generation counter • Periodically, non-current generation is checked to see if it is zero • Reverse current and non-current generations • A token is handed from one CPU to next • The token returns to a given CPU • All operations across the entire system have terminated.
Non-Blocking Grace-Period Detection Queues callbacks onto a list invoke all the pending callbacks after forcing a grace period
High-Performance Design • defer frees of kmem_cache_alloc() memory • detects and identifies overly long lock-hold durations • “Batching" grace-period-measurement requests • Maintaining per-CPU request lists • Providing a less-costly algorithm for measuring grace-period duration.
Simple Deferred Free • a simple implementation of a deferred-free function named kfree_rcu() • low performance • kfree_rcu()→wait for rcu()
Outline • Introduce • Toy Example • Simple Infrastructure to Support RCU • Application
Application • Distributed lock manager • TCP/IP • Storage-area network (SAN) • Application regions manager (which is a workload-management subsystem) • Process management • LAN drivers