190 likes | 202 Views
Kubiatowicz, Chaiken and Agarwal, "Closing the Window of Vulnerability in Multiphase Memory Transactions" MIT Computer Science Dept. a.k.a. "Kubi's baby". CS258 Lecture by: Dan Bonachea. Outline. Intro & Scope What architectural features create a WOV Window of Vulnerability - what is it?
E N D
Kubiatowicz, Chaiken and Agarwal, "Closing the Window of Vulnerability in Multiphase Memory Transactions"MIT Computer ScienceDept. a.k.a. "Kubi's baby" CS258 Lecture by: Dan Bonachea
Outline • Intro & Scope • What architectural features create a WOV • Window of Vulnerability - what is it? • Multiphase memory access • Potential for livelocks with WOV • Empirical measurements of severity • Deadlocks that can arise • Good & Bad Solutions for Closing the Window • Alewife implementation & Conclusions
Scope • Hardware cache-coherent distributed shared-memory multiprocessors, with: - multiphase shared memory transactions (request/reply) • long delays for accessing remote memory - polling-based completion (CPU retries until success) • as opposed to a signaling-based approach -and one or more of: • hardware context-switching, possibly with context-switch disable capabilities • high-availability interrupts (HAI) • prefetching or weak ordering Key property: hardware might not immediately consume the reply to its shared memory transaction and commit the load/store instruction
Anatomy of a Multiphase Memory Access • If response data is lost during the WOV due to invalidation or cache conflict, requestor cannot make forward progress
Architectural Features that lead to WOV problem • Prefetching or Weak ordering • allow processor to have multiple outstanding memory transactions (from same or different context) • some of the data addresses may conflict in the cache • with unified caches, response data may even conflict with instruction that initiated the transaction • Hardware context-switching • Hardware keeps several threads ready to run and quickly switches between them when one stalls • Often also have a mechanism to disable context switching (to support fast atomic operations & critical sections) • High-availability interrupts • any time we interrupt a load/store in progress to process network messages • used to implement software-assisted cache coherence, optimistic network deadlock recovery, etc. • has essentially the same effect as hardware context-switching
Livelocks that can occur with WOV • Invalidation thrashing • external protocol invalidation during the WOV • Intercontext thrashing • different local contexts with outstanding data transactions that conflict in cache • High Availability Interrupt thrashing • cache conflicts during interrupt handler replaces a data response • Instruction-Data thrashing • response data conflicts with the initiating instruction in the cache
Empirical measurements of WOV Alewife simulator: 64 processors, 4 contexts per processor, 1.5M cycles of a numerical integration app.
Broken Solution #1: Simple Locking • One simple idea for closing the WOV: • Add a "lock" bit to the cache line that delays invalidation and prevents conflict replacement on response data (set on arrival, clear on access) • Also need a bit to save the fact that an external invalidate is pending for the cache line • Also need a "transaction-in-progress" cache line state to prevent new transactions during request phase that would conflict in the cache • Not a perfect solution • Different context accessing same data could touch & unlock the line (fixable by adding more state) • Otherwise, fixes the WOV livelock problems, but….
Deadlocks Caused by Simple Locking Waits-for dependency arcs: • Congruence • cache conflicts • Protocol • external read req on data locked for write • Execution • program order on instruction completion • Disable • context switching has been disabled D=Data, I=Instruction, P=Primary, S=Secondary1,2 = node #, A,B,C,D = context #X and Y variables conflict in cache, Z does not
Solution #1: Associative Locking • Basic Idea: • Add a small, fully associative transaction buffer • Include address, state bits and space for data • Perform all locking on the transaction buffer entries • Defer invalidates on locked data (need address associativity to handle invalidates) • Optimization: merge references to same data from diff. contexts to reduce number of messages • Avoids conflicts due to limited cache assoc., which leads to some deadlocks • Removes all the "congruence" dependency arcs • Also solves all the livelock scenarios • Still can deadlock if we allow context-switch disable
Solution #2: Thrashwait • Observation: • locking is pessimistic: locks data to prevent vulnerability during WOV, thereby ensuring progress (prevention) • optimistic option: allow vulnerability, but detect livelock/thrashing when it happens and take steps to correct it (detection and recovery) • Basic idea: • dynamically detect when data got lost during WOV • tried-once bit on context says we attempted an access • transaction-in-progress state says transaction is complete, but data is missing • when we detect a loss, retry access and spin-wait for result (with context-switching disabled) • without HAI, this ensures WOV is length zero • Can still livelock in the presence of HAI
Broken Solution #2: Associative Thrashwait • Want to fix livelock problems of thrashwait in the presence of HAI • One possibility is to add associativity • add a transaction buffer similar to in associative locking • This is only a partial solution • Removes problems caused by cache conflicts • Prevents 3 of the 4 livelock scenarios • those involving cache conflicts • Still have invalidation thrashing • doesn't prevent external invalidations on the data while HAI is running • so WOV is still open during recovery and we can still livelock
Solution #3: Associative Thrashlock • Hybrid approach - combines benefits of: • Thrashwait, Associativity and Locking • Idea: • Augment Associative Thrashwait partial solution with a lock that defers all invalidations (one lock bit per CPU) • lock is turned on while spin-waiting in thrashing recovery • can run HAI handlers without danger of an invalidation • This solves the final livelock in Associative Thrashwait • Need a discipline for HAI handler code to prevent introducing new dependencies due to invalidation deferrment • handlers can't reference global memory • must always return to interrupted context
Alewife Implementation • Hardware: • Distributed shared-memory cache-coherent multiprocessor • 33 MHz SPARC-like CPU's • 4 hardware contexts with register windows • Uses Associative Thrashlock to close WOV • Hardware Reqts: • 16 transaction buffers • 8 tried-once bits and 2 lock bits • Provides: • HAI, context-switch w/disable, non-binding prefetch • 2 simul. transactions/context • Access merging btw. contexts
Conclusions • Window of Vulnerability is a problem for systems which have: • polling-based cache-coherent distributed shared-memory • and one or more of: • Multiple hardware contexts, possibly with context-switch disable • High-availability interrupts • Prefetching/weak ordering • Paper presents 3 solutions: • (correct choice based on architectural features)