300 likes | 425 Views
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007. Shimin Chen LBA Reading Group Presentation. Motivation. Synchronization is a challenging step in parallel programming
E N D
Colorama: Architectural Support for Data-Centric SynchronizationLuis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin Chen LBA Reading Group Presentation
Motivation • Synchronization is a challenging step in parallel programming • Transactional Memory helpful but still complicated • Programmers have to reason non-locally • Code-centric approach • Data-Centric Synchronization (DSC) desirable • Associate synchronization constraints with data structures • Which data items should be in the same critical section • System automatically inserts sync operations into code • Reason locally
What’s New? • Existing DCS proposal are SW-only (S-DCS) • Cannot handle C/C++ pointer aliasing • Unrealistic • New proposal: hardware DCS (H-DCS) • Colorama • HW primitives to start and exit critical sections • Independent of the underlying sync mechanisms
Outline • Introduction • Data-Centric Synchronization (DCS) • Architectures of Colorama • Programming with Colorama • Evaluation • Conclusion
Data-Centric Synchronization (DCS) • Data consistency domain • Two threads cannot access the same domain at the same time • For example: X, and Y are in the same domain • If a thread is accessing X, no other threads can access X & Y • System needs to automatically infer entry and exit points of critical sections: • Entry: access to data in a domain • Exit: define a simple, clear exit policy and let programmers write code to conform to this policy
Software DCS (S-DCS) • Vaziri et al’s Atomic Sets • Compiler and language extensions to Java • Data consistency domain: atomic set, subset of fields of a Java class • Entry point: compiler analysis • Exit policy: insert exit point • In the same method as the entry point and • Right before method return
Colorama: Hardware DCS • Data consistency domain: color • Data item belongs to a domain: colored • Entry point: detected by HW • Exit policy: driven by compiler • Examples:
Outline • Introduction • Data-Centric Synchronization (DCS) • Architectures of Colorama • Programming with Colorama • Evaluation • Conclusion
Structures Overview • Every colored data item has an entry in Palette (details next) • Per-thread: all 3 structures have the same number of entries • Owned color array: current critical sections • CAB, CRB: used for exit policy
Palette SW managed • Palette based on Mondrian Memory Protection system (Witchel et al. ASPLOS’02) – the white part • Extend with color ID (the gray part) HW
Entry Point • HW monitors each load and store • Check cached Palette for the mem op • Check owned colors array • Trigger a user-level SW handler if accessing a colored region not owned • Handler for entry point: • Add color ID into owned colors array • Start critical section (e.g. begin transaction)
Exit Policy • Exit a critical section when the thread returns from the subroutine where the critical section was entered
Implementing Exit Policy • Color acquire bitmap register (CAB) and color release bitmap register (CRB) • CAB automatically set by HW at entry points • Compiler generates the following code: • Subroutine prologue: Push CAB CAB 0 • Subroutine epilogue: CRB CAB Pop CAB • Upon write to CRB: HW triggers user-level handler • Handler: remove Color ID from owned color array, exit critical section
Handling Pointers as Subroutine Arguments • Perform multiple operations on a structure together • Propose “colorcheck” instruction
Using Locks as Sync Mechanisms • Colorama can also use locks • Two potential problems: • Longer critical section thus maybe more contention • May deadlock • See evaluations
Outline • Introduction • Data-Centric Synchronization (DCS) • Architectures of Colorama • Programming with Colorama • Evaluation • Conclusion
Correctness • Critical sections of the same color are serialized • Correctly colored programs data-race free • Possible programming errors: • Fail to color shared data structures • Use different colors to data that should be protected together
Compatibility Issues • Legacy libraries that do not use Colorama • OK if they explicitly protect lib data using locks, etc. • Colorama protects application data outside of lib • Cases requires extensions to Colorama • Worker thread executes an infinite loop that processes incoming request • Needs to release lock, wait, acquire lock in the same loop • Colorama extensions: getcolorid etc.
Outline • Introduction • Data-Centric Synchronization (DCS) • Architectures of Colorama • Programming with Colorama • Evaluation • Conclusion
Setup • Evaluation is based on analyzing applications by using a Pin-based tool
Is the Exit Policy Suitable? • Matched: lock acquire & release in same subroutine • Almost all dynamic and 95% static critical sections • Answer: Yes
How often multiple independent critical sections are in the same subroutine? • Potential deadlocks • 1% dynamic and 4% static • Detailed analysis shows that the resulting lock order always same, thus no deadlocks
Structure Sizes • # palette rows: # of allocated regions + # of static data objects • # of colors: # lock addr • # of Owned Colors Array entries: max # of active locks held by a thread
Colorama Instruction Overheads • Per-routine: • Prologue & epilogue: 6 insn/routine • 1 colorcheck insn per pointer argument • Estimate 7 insn/routine • On avg, 1.6 routines per 100 dynamic insns: so ~11% insns • Entry and exit handlers: low freq of critical section enry and exit, so low overhead • Coloring overheads ~ memory allocation calls • # of insns between allocations: firefox, gaim, gftp – 2-4K • Memory allocators can keep pools of colored memory (??)
Memory Overhead • MMP: Mondrian Memory Protection • Palette adds 1-2.5% more space over app footprint
Conclusions • Colorama: Hardware Data-Centric Synchronization • HW support for entry and exit points • Evaluation suggests: • Exit policy is suitable • Low impact on critical section lengths • Modest additional overhead over MMP • This paper does not even do simulation!
Related Work • monitors