350 likes | 457 Views
Safe and Efficient Supervised Memory Systems. 1) Out-of-band metadata per data block 2) Monitor, control (supervise) data accesses 3) Run handlers on specific metadata states. Jayaram Bobba † , Marc Lupon ‡ , Mark D. Hill , and David A. Wood. Department of Computer Sciences
E N D
Safe and Efficient Supervised Memory Systems 1) Out-of-band metadata per data block 2) Monitor, control (supervise) data accesses 3) Run handlers on specific metadata states Jayaram Bobba†, Marc Lupon‡, Mark D. Hill, and David A. Wood Department of Computer Sciences University of Wisconsin-Madison †Intel Corporation ‡UniversitatPolitècnica de Catalunya † ‡Work done while at University of Wisconsin-Madison
Why Supervised Memory Systems? • SW more complex • Productivity Wall HW more powerful Hardware Support to Improve Productivity Empty/full-bits Hardware TM Supervised (Memory) Systems MemTracker,SafeMem,iWatcher Deterministic Shared Memory Hardware-assisted GC Information Flow Tracking Wisconsin Multifacet Project
Executive Summary • Many supervised memory systems • Assume SC, but few systems do SC • Moving to TSO (x86 & SPARC) non-trivial • Supervised Memory for TSO • TSOall: TSO for data & metadata slow • TSOdata: TSO for data & metadata tricky • Safe Supervision • Metadata for X only controls data at X • Fast & Simple Formal Foundation Current/Future Supervised Systems Wisconsin Multifacet Project
Outline • Introduction • Move To TSO non-trivial • Case Study: Deterministic Multiprocessor (DMP) • Supervised Memory for TSO • Safe Supervision • Evaluation Wisconsin Multifacet Project
Reordering can be incorrect A TSO-compliant system P1 P2 PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 Processor ST 0x10, C 0x01 ST A LD B r1 r1 r2 r2 r3 r3 0x10 Store Buffer Memory Wisconsin Multifacet Project
Reordering can be incorrect DMP-ShTab[Deviettiet al., ASPLOS 09] P1 P2 PC PC LD [Y], r2 ST r2, [Y] ST 2,[A] LD [B], r3 LD [X], r1 LD [B], r2 ST 1, [B] Private Processor r1 r1 0x11 Shared r2 r2 0xff 0x00 T1 T2 r3 r3 0x01 STALL STALL Owned@T2 Shared@T1,T2 Memory 0x00 0x01 Shared@T1,T2 Owned@T1 Owned@T2 Wisconsin Multifacet Project
Explore relaxed supervised systems Reordering can be incorrect Is reordering safe? A Case Study DMP-ShTabon TSO P1 P2 PC PC LD [Y], r2 ST r2, [Y] ST 2,[A] LD [B], r3 LD [X], r1 LD [B], r2 ST 1, [B] Private Processor ST 0x10, A r1 r1 0x11 Shared r2 r2 0xff T1 T2 r3 r3 0x00 STALL STALL Store Buffer Case1: LD B does not pass ST A r3 gets 0x01 Shared@T1,T2 Memory Case2: LD B passes ST A r3 gets 0x00 0x00 Owned@T2 Wisconsin Multifacet Project
Outline • Introduction • Move To TSO non-trivial • Supervised Memory for TSO • Define Supervised Memory • TSOall: Simple but Slow • TSOdata: Fast but tricky • Safe Supervision • Evaluation Wisconsin Multifacet Project
Define Supervised Memory Supervised Memory for TSO Supervised Memory • Each memory location A, • data (A.d) • metadata (A.m) • New operations • Supervised Load (sLD A) • Supervised Store (sST A) • Jump on reading special metadata (Optionally) • Hardware exception Wisconsin Multifacet Project
Define Supervised Memory Supervised Memory for TSO Supervised Operations sLD A => Start: atomic{ curm = Val[RA.m] // Read metadata nextm = NEXT(Load, curm) // Check software- // specified FSM If nextm == EXCEPTION then Jump to Handler If (nextm != curm) then WA.m,nextm // Update metadata RA.d// Read data } Handler: … Wisconsin Multifacet Project
Supervised Memory for TSO TSO Axioms [Hangal et al., ISCA 2004] Wisconsin Multifacet Project
Supervised Memory for TSO TSO Axioms [Hangal et al., ISCA 2004] Reordering Axioms Rd A Rd B Rd A Wr B Wr A Wr B Wr A Rd B Allows store buffers Wisconsin Multifacet Project
Supervised Memory for TSO TSOall: A Consistency Model for Supervised Memory TSO axioms applied to all accesses—data and metadata + (Simple) Like TSO — (Slow) Prohibits optimizations Thread: sST A sLD B => Store buffers ineffective ->[Rd A.m, WrA.d, WrA.m] ->[Rd B.m, Rd B.d] Wisconsin Multifacet Project
Supervised Memory for TSO TSOdata: Fast Yet Simple Thread: sST B sLDA Reordering Axioms ->[Rd A.m, WrA.d, WrA.m] • Store buffers can • be used ->[Rd B.m, Rd B.d] Wisconsin Multifacet Project
Outline • Introduction • Move To TSO non-trivial • Supervised Memory for TSO • Safe Supervision • Evaluation Wisconsin Multifacet Project
Safe Supervision Safe SupervisionMotivation No Reordering, Easy to Reason (TSOall) vs Reorder, Performance (TSOdata) Wisconsin Multifacet Project
Safe Supervision Blast from the Past[Adve and Hill, ISCA1990] No Reordering, Easy to Reason (SC) vs Reorder, Performance (RC) • Observation: • Simple programs rely only on certain SC orders • Ignore non-essential orders. Still appears as SC • Challenge:Simple? Non-essential orders? • Solution:Data-race-freedom • For data-race-free programs, RC = SC Wisconsin Multifacet Project
Safe Supervision Safe SupervisionMotivation No Reordering, Easy to Reason (TSOall) vs Reorder, Performance (TSOdata) • Observation: • Simple supervised programs rely only on certain orders • Ignore non-essential orders. Still appears as TSOall • Challenge:Simple? Non-essential orders? • Solution:Safe Supervision • For safely-supervised programs, TSOdata = TSOall Wisconsin Multifacet Project
Safe Supervision Safe Supervision • A location’s metadata is only used to control access to that location’s data • Most uses of supervision are safely supervised. E.g., • Heap Checker: Initialized/Uninitialized values • Transactional Memory: Conflict Detection information • DMP is NOT safely-supervised Initially, A.mdata = Empty, B.data = 0 Thread 1: B.data = 1 A.mdata = Full Thread 2: While (A.mdata == Empty); Read B.data Wisconsin Multifacet Project
Outline • Introduction • Move To TSO non-trivial • Supervised Memory for TSO • Safe Supervision • Evaluation • Is reordering useful? Wisconsin Multifacet Project
Reordering is useful Supervised Systems • TokenTM [bobba et al., ISCA2008] • Transactional Memory • Metadata for tracking read/write-sets • HARD [zhou et al., HPCA2007] • Race Detection • Metadata for tracking sharing state and locksets • Both safely-supervised Wisconsin Multifacet Project
Reordering is useful Evaluation Setup • Systems • TokenTM on in-order • TokenTMallon TSOall, TokenTMdataon TSOdata • HARD on OOO superscalar • HARDallon TSOall, HARDdataon TSOdata • Simulation built on Multifacet GEMS • Workloads • TokenTM: STAMP • HARD: Wisconsin Commercial Workload Suite Wisconsin Multifacet Project
Reordering is useful Results TokenTM Speedups: 3% in Kmeans to 22% in Labyrinth Wisconsin Multifacet Project
Reordering is useful Results HARD Speedups: 3% in JBB to 5% in Apache Wisconsin Multifacet Project
In the paper… • Formal models • Formal Definition of Safe Supervision • Proofs (in thesis) http://www.cs.wisc.edu/multifacet/theses/jayaram_bobba_phd.pdf • OpenSPARC case study • How to handle reordering issues? • Metadata overhead Wisconsin Multifacet Project
Executive Summary • Many supervised memory systems • Assume SC, but few systems do SC • Moving to TSO (x86 & SPARC) non-trivial • Supervised Memory for TSO • TSOall: TSO for data & metadata slow • TSOdata: TSO for data & metadata tricky • Safe Supervision • Metadata for X only controls data at X • Fast & Simple Formal Foundation Current/Future Supervised Systems Wisconsin Multifacet Project
Explore relaxed supervised systems Deterministic Shared Memory (DMP)[Devietti et al., ASPLOS 2009] “depending upon the consistency model of the underlying hardware, threads must perform a memory fence at the edge of a quantum” • Insert a fence after the last operation in the quantum • Insert a fence before the first shared operation in the quantum I3: Reordered metabit-reads Wisconsin Multifacet Project Illustration
Explore relaxed supervised systems Is reordering trivial?Empty/full-bits PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 ST Processor ST 0x10, C 0x01 r1 r1 Full Empty r2 r2 r3 r3 LD Store Buffer I2: NO LOAD BYPASS ST LD Exception EXCEPTION LD/ST None Memory I3: LATE EXCEPTIONS Wisconsin Multifacet Project
TSOdata on OpenSPARC T2 • Goal: Explore low-level issues on a real design • Late Exceptions with deferred handlers • Dump store buffer entries on exception • Enhance store buffer to carry Virtual Address (VA) • ~200 cycles to read out 4 entries • Disable store buffer bypassing for supervised loads • Low space overhead for adding metabits (~4%) Wisconsin Multifacet Project
Explore relaxed memory systems Existing proposals assume SC • Assume SC or don’t deal with multiprocessors Wisconsin Multifacet Project
Non-TSOall Executions Wisconsin Multifacet Project
TSOdata is Complex Empty/full-bits sST Initial State: A.d = 0, A.m = None B.d = 0, B.m = Empty Full Empty T0: dST 1, A sLD B T1: sST B, 1 dLD A sLD sST Can dLD A return 0? Exception Wisconsin Multifacet Project
Safe Supervision Wisconsin Multifacet Project